首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The diagnosis problem for discrete event systems consists in deciding whether some fault event occurred or not in the system, given partial observations on the run of that system. Diagnosability checks whether a correct diagnosis can be issued in bounded time after a fault, for all faulty runs of that system. This problem appeared two decades ago and numerous facets of it have been explored, mostly for permanent faults. It is known for example that diagnosability of a system can be checked in polynomial time, while the construction of a diagnoser is exponential. The present paper examines the case of transient faults, that can appear and be repaired. Diagnosability in this setting means that the occurrence of a fault should always be detected in bounded time, but also before the fault is repaired, in order to prepare for the detection of the next fault or to take corrective measures while they are needed. Checking this notion of diagnosability is proved to be PSPACE-complete. It is also shown that faults can be reliably counted provided the system is diagnosable for faults and for repairs.  相似文献   

2.
宇宙射线辐射所导致的瞬态故障一直是航天计算面临的最主要挑战之一.而随着集成电路制造工艺的持续进步,现代处理器的性能在大幅度提高的同时,其可信性也正日益面临着瞬态故障的严重威胁.当前针对瞬态故障的容错技术可大致分为两类:基于硬件实现和基于软件实现.相比较前者,后者由于在实现成本和灵活性等方面的优势而备受关注.本文首先概述...  相似文献   

3.
鉴于瞬时故障是导致控制系统事故的主要故障形式,瞬时故障恢复是保证系统安全的重要手段,首先,介绍了当前通过主动冗余和基于系统模型分析进行瞬时故障恢复的方法;然后,综述这些技术在网络化控制系统的通信网络、网络节点、系统层面瞬时故障恢复和安全控制中的应用研究;最后,对网络化控制系统瞬时故障恢复和安全控制方法的发展趋势进行了展望.  相似文献   

4.
本文主要给出现有主流软件容错技术的一个综述。首先从传统软件容错技术开始,介绍设计多样性和数据多样性;然后介绍主流的软件容错新技术,如重配置与重恢复、指令复制错误探测、SWIFT等,同时,站在软件容错用于处理嵌入式系统硬件暂态故障的角度对这些技术进行了分析;最后在对它们比较的基础上探讨软件容错技术的可能发展方
向。  相似文献   

5.
6.
Higher transistor counts, lower voltage levels, and reduced noise margin increase the susceptibility of multicore processors to transient faults. Redundant hardware modules can detect such faults, but software techniques are more appealing for their low cost and flexibility. Recent software proposals have not achieved widespread acceptance because they either increase register pressure, double memory usage, or are too slow in the absence of hardware extensions. This paper presents DAFT, a fast, safe, and memory efficient transient fault detection framework for commodity multicore systems. DAFT replicates computation across multiple cores and schedules fault detection off the critical path. Where possible, values are speculated to be correct and only communicated to the redundant thread at essential program points. DAFT is implemented in the LLVM compiler framework and evaluated using SPEC CPU2000 and SPEC CPU2006 benchmarks on a commodity multicore system. Evaluation results demonstrate that speculation allows DAFT to improves the performance of software redundant multithreading by 2.17× with no degradation of fault coverage.  相似文献   

7.
Diagnosis of continuous valued systems in transient operatingregions   总被引:1,自引:0,他引:1  
The complexity of present day embedded systems (continuous processes controlled by digital processors), and the increased demands on their reliability motivate the need for monitoring and fault isolation capabilities in the embedded processors. This paper develops monitoring, prediction, and fault isolation methods for abrupt faults in complex dynamic systems. The transient behavior in response to these faults is analyzed in a qualitative framework using parsimonious topological system models. Predicted transient effects of hypothesized faults are captured in the form of signatures that specify future faulty behavior as higher order time-derivatives. The dynamic effects of faults are analyzed by a progressive monitoring scheme till transient analysis mechanisms have to be suspended in favor of steady state analysis. This methodology has been successfully applied to monitoring of the secondary sodium cooling loop of a fast breeder reactor  相似文献   

8.
Regression testing is an important activity in the software life cycle, but it can also be very expensive. To reduce the cost of regression testing, software testers may prioritize their test cases so that those which are more important, by some measure, are run earlier in the regression testing process. One potential goal of test case prioritization techniques is to increase a test suite's rate of fault detection (how quickly, in a run of its test cases, that test suite can detect faults). Previous work has shown that prioritization can improve a test suite's rate of fault detection, but the assessment of prioritization techniques has been limited primarily to hand-seeded faults, largely due to the belief that such faults are more realistic than automatically generated (mutation) faults. A recent empirical study, however, suggests that mutation faults can be representative of real faults and that the use of hand-seeded faults can be problematic for the validity of empirical results focusing on fault detection. We have therefore designed and performed two controlled experiments assessing the ability of prioritization techniques to improve the rate of fault detection of test case prioritization techniques, measured relative to mutation faults. Our results show that prioritization can be effective relative to the faults considered, and they expose ways in which that effectiveness can vary with characteristics of faults and test suites. More importantly, a comparison of our results with those collected using hand-seeded faults reveals several implications for researchers performing empirical studies of test case prioritization techniques in particular and testing techniques in general  相似文献   

9.
Algorithm-based fault tolerance (ABFT) is a technique which improves the reliability of a multiprocessor system by providing concurrent error detection and fault location capability to it. It encodes data at the system level and modifies the algorithm to operate on the encoded data in order to expose both transient and permanent faults in any processor. Work done till now in this area takes care of only the fault detection and location part of the problem. However, if spare processors are not available, then after a faulty processor has been located, the work initially assigned to it has to be mapped to some nonfaulty processors in the system in such a way that the fault tolerance capability of the system is still maintained with as small a degradation in performance as possible. In this paper, we propose an integrated deterministic solution to the above problem which combines concurrent error detection and fault location with graceful degradation. There exists no previous deterministic ABFT method for the design of general t-fault locating systems, even for the case of t=1. We propose a general method for designing one-fault locating/s-fault detecting systems. We use an extended model for representing ABFT systems. This model considers the processors computing the checks to be a part of the ABFT system, so that faults in the check computing processors can also be detected and located using a simple diagnosis algorithm, and the checks can be mapped to other nonfaulty processors in the system  相似文献   

10.
The authors demonstrate the need to address fault latency in highly reliable real-time control computer systems. It is noted that the effectiveness of all known recovery mechanisms is greatly reduced in the presence of multiple latent faults. The presence of multiple latent faults increases the possibility of multiple errors, which could result in coverage failure. The authors present experimental evidence indicating that the duration of fault latency is dependent on workload. A synthetic work generator is used to vary the workload, and a hardware fault injector is applied to inject transient faults of varying durations. This method makes it possible to derive the distribution of fault latency duration. Experimental results obtained from the fault-tolerant multiprocessor at the NASA Airlab are presented and discussed  相似文献   

11.
To maintain the efficient and reliable operation of power systems, it is extremely important that the transmission line faults need to be detected and located in a reliable and accurate manner. A number of mathematical and intelligent techniques are available in the literature for estimating the fault location. However, the results are not satisfactory due to the wide variation in operating conditions such as system loading level, fault inception instance, fault resistance and dc offset and harmonics contents in the transient signal of the faulted transmission line. Keeping in view of aforesaid, a new approach based on generalized neural network (GNN) with wavelet transform is presented for fault location estimation. Wavelet transform is used to extract the features of faulty current signals in terms of standard deviation. Obtained features are used as an input to the GNN model for estimating the location of fault in a given transmission systems. Results obtained from GNN model are compared with ANN and well established mathematical models and found more accurate.  相似文献   

12.
Adaptive compensation for infinite number of actuator failures or faults   总被引:1,自引:0,他引:1  
It is both theoretically and practically important to investigate the problem of accommodating infinite number of actuator failures or faults in controlling uncertain systems. However, there is still no result available in developing adaptive controllers to address this problem. In this paper, a new adaptive failure/fault compensation control scheme is proposed for parametric strict feedback nonlinear systems. The techniques of nonlinear damping and parameter projection are employed in the design of controllers and parameter estimators, respectively. It is proved that the boundedness of all closed-loop signals can still be ensured in the case with infinite number of failures or faults, provided that the time interval between two successive changes of failure/fault pattern is bounded below by an arbitrary positive number. The performance of the tracking error in the mean square sense with respect to the frequency of failure/fault pattern changes is also established. Moreover, asymptotic tracking can be achieved when the total number of failures and faults is finite.  相似文献   

13.
In this paper we consider a model-based fault detection and isolation problem for linear time-invariant dynamic systems subject to faults and disturbances. We use a state observer scheme that cancels the system dynamics and defines a residual vector signal that is sensitive only to faults and disturbances. We then design a stable fault detection and isolation filter such that the ?-norm of the transfer matrix function from disturbances to the residual is minimised (for fault detection) subject to the constraint that the transfer matrix function from faults to residual is equal to a pre-assigned diagonal transfer matrix (for isolation of possibly simultaneous occurring faults). Our solution is given in the form of linear matrix inequalities using state-space techniques, as well as a model matching problem using matrix factorisation techniques. A numerical example is given to illustrate the efficiency of the fault detection and isolation filter.  相似文献   

14.
Real-time systems often have very high reliability requirements and are therefore prime candidates for the inclusion of fault tolerance techniques. In order to provide tolerance to software faults, some form of state restoration is usually advocated as a means of recovery. State restoration can be expensive and the cost is exacerbated for systems which utilize concurrent processes. The concurrency present in most real-time systems and the further difficulties introduced by timing constraints suggest that providing tolerance for software faults may be inordinately expensive or complex. We believe that this need not be the case, and propose a straightforward pragmatic approach to software fault tolerance'which is believed to be applicable to many real-time systems. The approach takes advantage of the structure of real-time systems to simplify error recovery, and a classification scheme for errors is introduced. Responses to each type of error are proposed which allow service to be maintained.  相似文献   

15.
Designing fault-tolerant techniques for SRAM-based FPGAs   总被引:2,自引:0,他引:2  
FPGAs have become prevalent in critical applications in which transient faults can seriously affect the circuit's operation. We present a fault tolerance technique for transient and permanent faults in SRAM-based FPGAs. This technique combines duplication with comparison (DWC) and concurrent error detection (CEO) to provide a highly reliable circuit while maintaining hardware, pin, and power overheads far lower than with classic triple-modular-redundancy techniques.  相似文献   

16.
Transient faults are emerging as a critical concern in the reliability of general-purpose microprocessors. As architectural trends point toward multicore designs, there is substantial interest in adapting such parallel hardware resources for transient fault tolerance. This paper presents process-level redundancy (PLR), a software technique for transient fault tolerance, which leverages multiple cores for low overhead. PLR creates a set of redundant processes per application process and systematically compares the processes to guarantee correct execution. Redundancy at the process level allows the operating system to freely schedule the processes across all available hardware resources. PLR uses a software-centric approach to transient fault tolerance, which shifts the focus from ensuring correct hardware execution to ensuring correct software execution. As a result, many benign faults that do not propagate to affect program correctness can be safely ignored. A real prototype is presented that is designed to be transparent to the application and can run on general-purpose single-threaded programs without modifications to the program, operating system, or underlying hardware. The system is evaluated for fault coverage and performance on a four-way SMP machine and provides improved performance over existing software transient fault tolerance techniques with a 16.9 percent overhead for fault detection on a set of optimized SPEC2000 binaries.  相似文献   

17.
容错优先级混合式分配搜索算法   总被引:1,自引:0,他引:1  
在实时系统中,由于任务未能及时产生正确结果将导致灾难性后果,容错对于实时系统的有效性及可靠性至关重要.基于最坏响应时间计算的可调度性分析,提出了一种容错优先级混合式分配搜索算法.这种算法通过允许替代任务既能运行在高优先级别上,又可运行在低优先级别上,有效地提高了系统的容错能力.通过实验测试,与目前所知的同类算法相比,在提高系统容错能力方面更为有效.  相似文献   

18.
Debugging deployed systems is an arduous and time consuming task. It is often difficult to generate traces from deployed systems due to the disturbance and overhead that trace collection may cause on a system in operation. Many organizations also do not keep historical traces of failures. On the other hand earlier techniques focusing on fault diagnosis in deployed systems require a collection of passing–failing traces, in-house reproduction of faults or a historical collection of failed traces. In this paper, we investigate an alternative solution. We investigate how artificial faults, generated using software mutation in test environment, can be used to diagnose actual faults in deployed software systems. The use of traces of artificial faults can provide relief when it is not feasible to collect different kinds of traces from deployed systems. Using artificial and actual faults we also investigate the similarity of function call traces of different faults in functions. To achieve our goal, we use decision trees to build a model of traces generated from mutants and test it on faulty traces generated from actual programs. The application of our approach to various real world programs shows that mutants can indeed be used to diagnose faulty functions in the original code with approximately 60–100% accuracy on reviewing 10% or less of the code; whereas, contemporary techniques using pass–fail traces show poor results in the context of software maintenance. Our results also show that different faults in closely related functions occur with similar function call traces. The use of mutation in fault diagnosis shows promising results but the experiments also show the challenges related to using mutants.  相似文献   

19.
Fault-tolerant grid architecture and practice   总被引:10,自引:0,他引:10       下载免费PDF全文
Grid computing emerges as effective technologies to couple geographically dis-tributed resources and solve large-scale computational problems in wide area networks. The fault tolerance is a significant and complex issue in grid computing systems. Various techniques have been investigated to detect and correct faults in distributed computing systems. Unreliable fault detection is one of the most effective techniques. Globus as a grid middleware manages resources in a wide area network. The Globns fault detection service uses the well-known techniques basedon unreliable fault detectors to detect and report component failures. However, more powerful techniques are required to detect and correct both system-level and application-level faults in agrid system, and a convenient toolkit is also needed to maintain the consistency in the grid. Afault-tolerant grid platform (FTGP) based on an unreliable fault detector and the Globus faultdetection service is presented in this paper. The platform offers effective strategies in such threeaspects as grid key components, user tasks, and high-level applications.  相似文献   

20.
为正确有效地使用IEEE标准的数字测试交换格式(DTIF),提高自动测试系统对数字电路进行故障诊断的水平和兼容性,深入分析和研究了该数据格式的结构和组成;在使用探测组数据进行探笔引导测试中提出智能动态关联网络技术,以提高探测效率;在使用故障字典组数据进行故障字典诊断中提出991匹配法则,以准确隔离故障集;通过对某数字电路的实际诊断,证明了这两种方法的有效性和准确可行性;该技术的实现对于DTIF数据格式的推广应用、数字电路故障诊断水平的提高具有重要意义。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号