首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
可信计算及其关键技术研究   总被引:2,自引:0,他引:2  
The dependability is the latest and highest techno-target used to evaluate the performance quality of a dis-tributed computing system in open network environment, it includes traditional reliability, availability, robustness,survivability, security, data integrity and software protecting ability, etc. A dependable system should not only be provided with fault tolerance ability, but also withstand from risk and recover from disaster, its realization foun dationis the high availability of the information transmission Jaetwork and survivability, fault tolerance and security safe-guard of the system. This paper presents a survey of the survivability mechanisms such as long-distance backup, clus-ter and system recovery, while discussing the techniques of fault tolerance design and information network system se-curity safeguard, and analyzing the information redundant dispersal strategy and model for survivability and security safeguard.  相似文献   

2.
This paper presents the design and implementation of Jgroup/ARM, a distributed object group platform with autonomous replication management along with a novel measurement‐based assessment technique that is used to validate the fault‐handling capability of Jgroup/ARM. Jgroup extends Java RMI through the group communication paradigm and has been designed specifically for application support in partitionable systems. ARM aims at improving the dependability characteristics of systems through a fault‐treatment mechanism. Hence, ARM focuses on deployment and operational aspects, where the gain in terms of improved dependability is likely to be the greatest. The main objective of ARM is to localize failures and to reconfigure the system according to application‐specific dependability requirements. Combining Jgroup and ARM can significantly reduce the effort necessary for developing, deploying and managing dependable, partition‐aware applications. Jgroup/ARM is evaluated experimentally to validate its fault‐handling capability; the recovery performance of a system deployed in a wide area network is evaluated. In this experiment multiple nearly coincident reachability changes are injected to emulate network partitions separating the service replicas. The results show that Jgroup/ARM is able to recover applications to their initial state in several realistic failure scenarios, including multiple, concurrent network partitionings. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

3.
This paper deals with evaluation of the dependability (considered as a generic term, whose main measures are reliability, availability, and maintainability) of software systems during their operational life, in contrast to most of the work performed up to now, devoted mainly to development and validation phases. The failure process due to design faults, and the behavior of a software system up to the first failure and during its life cycle are successively examined. An approximate model is derived which enables one to account for the failures due to the design faults in a simple way when evaluating a system's dependability. This model is then used for evaluating the dependability of 1) a software system tolerating design faults, and 2) a computing system with respect to physical and design faults.  相似文献   

4.
On dependability of computing systems   总被引:2,自引:0,他引:2       下载免费PDF全文
With the rapid development and wide applications of computing systems on which more reliance has been put,a dependable system will be much more important than ever.This paper is first aimed at giving informal but precise definitions characterizing the various attributes o dependability of computing systems an then th importance of (and the relationships among)all te the attributes are explained.Dependability is first introduced as a blobal concept which subsumes the usual atributes of reliability,availability,maintainability,safety and sevurity.The basic definitions given here are then commended and supplemented by detailed material and additional explanations in the subsequent sections. The presentation has been structured as follows so as to attract the reader‘s attention to the important attributions of dependability.Search for a few number of concise concepts enabling the dependability attributes to be expressed as clearly as possible.Use of terms which are identical or as close as possible to those commonly used nowadays.This paper is also intended to provoke people‘s interest in designing a dependable computing system.  相似文献   

5.
Based on extensive field failure data for Tandem's GUARDIAN operating system, the paper discusses evaluation of the dependability of operational software. Software faults considered are major defects that result in processor failures and invoke backup processes to take over. The paper categorizes the underlying causes of software failures and evaluates the effectiveness of the process pair technique in tolerating software faults. A model to describe the impact of software faults on the reliability of an overall system is proposed. The model is used to evaluate the significance of key factors that determine software dependability and to identify areas for improvement. An analysis of the data shows that about 77% of processor failures that are initially considered due to software are confirmed as software problems. The analysis shows that the use of process pairs to provide checkpointing and restart (originally intended for tolerating hardware faults) allows the system to tolerate about 75% of reported software faults that result in processor failures. The loose coupling between processors, which results in the backup execution (the processor state and the sequence of events) being different from the original execution, is a major reason for the measured software fault tolerance. Over two-thirds (72%) of measured software failures are recurrences of previously reported faults. Modeling, based on the data, shows that, in addition to reducing the number of software faults, software dependability can be enhanced by reducing the recurrence rate  相似文献   

6.
Faults or failures are inevitable to occur and their prompt detection and isolation are essential for the dependability of various systems and for avoiding damages to the system itself, persons and the environment. Therefore, the safety of helicopter platforms have attracted the attention of many researchers in the past two decades. In order to deal with these problems, this paper presents an overview of the recent development and current researches in the field of fault diagnosis, including analytical/model-based, signal processing-based and knowledge-based techniques, and also passive/active fault- tolerant control approaches. Among various helicopters, single-rotor aerial vehicles, i.e. manned helicopters, unmanned helicopters, two and three degree-of-freedom unmanned helicopter experimental platforms, are considered for providing an overall picture of the fault diagnosis and fault-tolerant control approaches based on the review of journal articles in last two decades, conference articles in last several years and some books.  相似文献   

7.
In general, faults cannot be prevented; instead, they need to be tolerated to guarantee certain degrees of software dependability. We develop a theory for fault tolerance for a distributed pi-calculus, whereby locations act as units of failure and redundancy is distributed across independently failing locations. We give formal definitions for fault tolerant programs in our calculus, based on the well studied notion of contextual equivalence. We then develop bisimulation proof techniques to verify fault tolerance properties of distributed programs and show they are sound with respect to our definitions for fault tolerance.  相似文献   

8.
人联网(IoP)系统的架构复杂且存在海量、实时变化的数据,使得基于IoP系统的可靠性分析变得十分困难,目前仍缺乏一种健全的基于IoP系统的可靠性建模及评估方法。提出一种新型的IoP系统可靠性评估方法,利用AADL及其附件语言对IoP系统进行可靠性建模,并基于该模型从定性角度评估系统故障的根本原因和风险。此外,结合Ocarina模型转换技术提出一种基于连续时间马尔科夫链(CTMC)的定量评估算法,将AADL可靠性模型转换为CTMC模型,实现对系统动态、实时等特性的评估。在此基础上,设计一个IoP系统通用模型,并以此为案例验证所提方法的可行性。实验结果表明,该方法不仅能对IoP系统建模,而且能自动、准确地对其进行可靠性分析,具有良好的应用价值。  相似文献   

9.
Fault Tree Analysis (FTA) is a well-established and well-understood technique, widely used for dependability evaluation of a wide range of systems. Although many extensions of fault trees have been proposed, they suffer from a variety of shortcomings. In particular, even where software tool support exists, these analyses require a lot of manual effort. Over the past two decades, research has focused on simplifying dependability analysis by looking at how we can synthesise dependability information from system models automatically. This has led to the field of model-based dependability analysis (MBDA). Different tools and techniques have been developed as part of MBDA to automate the generation of dependability analysis artefacts such as fault trees. Firstly, this paper reviews the standard fault tree with its limitations. Secondly, different extensions of standard fault trees are reviewed. Thirdly, this paper reviews a number of prominent MBDA techniques where fault trees are used as a means for system dependability analysis and provides an insight into their working mechanism, applicability, strengths and challenges. Finally, the future outlook for MBDA is outlined, which includes the prospect of developing expert and intelligent systems for dependability analysis of complex open systems under the conditions of uncertainty.  相似文献   

10.
If an off-the-shelf software product exhibits poor dependability due to design faults, then software fault tolerance is often the only way available to users and system integrators to alleviate the problem. Thanks to low acquisition costs, even using multiple versions of software in a parallel architecture, which is a scheme formerly reserved for few and highly critical applications, may become viable for many applications. We have studied the potential dependability gains from these solutions for off-the-shelf database servers. We based the study on the bug reports available for four off-the-shelf SQL servers plus later releases of two of them. We found that many of these faults cause systematic noncrash failures, which is a category ignored by most studies and standard implementations of fault tolerance for databases. Our observations suggest that diverse redundancy would be effective for tolerating design faults in this category of products. Only in very few cases would demands that triggered a bug in one server cause failures in another one, and there were no coincident failures in more than two of the servers. Use of different releases of the same product would also tolerate a significant fraction of the faults. We report our results and discuss their implications, the architectural options available for exploiting them, and the difficulties that they may present.  相似文献   

11.
In the hot-standby replication system, the system cannot process its tasks anymore when all replicated nodes have failed. Thus, the remaining living nodes should be well-protected against failure when parts of replicated nodes have failed. Design faults and system-specific weaknesses may cause chain reactions of common faults on identical replicated nodes in replication systems. These can be alleviated by replicating diverse hardware and software. Going one-step forward, failures on the remaining nodes can be suppressed by predicting and preventing the same fault when it has occurred on a replicated node. In this paper, we propose a fault avoidance scheme which increases system dependability by avoiding common faults on remaining nodes when parts of nodes fail, and analyze the system dependability.  相似文献   

12.
In this paper, we propose and evaluate a framework for fault tolerant workflow execution in Grid environments. Different from previous work in the literature, our system dynamically chooses an appropriate fault tolerance technique while using a user-defined rule-based system. We also provide a generic interface that can be used to add fault tolerance techniques to the framework. The results obtained with real workflows in an experimental Grid environment show that the overhead introduced by our framework in a failure-free execution is, in the worst evaluated case, approximately 10 %. Moreover, we show that, using our framework, workflows are able to execute successfully in the presence of failures and that the framework can dynamically choose an appropriate fault tolerance technique. The main contributions of our work are twofold: the developed framework and the model-based dependability analysis we performed on it. The purpose in carrying out a model-based dependability analysis consists on evaluating the interaction between our framework and the distributed Grid environment beyond the physical limitations of an empirical evaluation. By doing this, we provide means to plan the assurance of QoS in the Grid resource allocation, while applying the fault-tolerance mechanisms we implement in our framework regardless of the underlying middleware.  相似文献   

13.
The AQuA architecture provides adaptive fault tolerance to CORBA applications by replicating objects and providing a high-level method that an application can use to specify its desired level of dependability. This paper presents the algorithms that AQUA uses, when an application's dependability requirements can change at runtime, to tolerate both value faults in applications and crash failures simultaneously. In particular, we provide an active replication communication scheme that maintains data consistency among replicas, detects crash failures, collates the messages generated by replicated objects, and delivers the result of each vote. We also present an adaptive majority voting algorithm that enables the correct ongoing vote while both the number of replicas and the majority size dynamically change. Together, these two algorithms form the basis of the mechanism for tolerating and recovering from value faults and crash failures in AQuA  相似文献   

14.
A dependability model for TMR system   总被引:1,自引:0,他引:1  
Much research has been done on the dependability evaluation of computer systems. However, much of this is gone no further than study of the fault coverage of such systems, with little focus on the relationship between fault coverage and overall system dependability. In this paper, a Markovian dependability model for triple-modular-redundancy (TMR) system is presented. Having fully considered the effects of fault coverage, working time, and constant failure rate of single module on the dependability of the target TMR system, the model is built based on the stepwise degradation strategy. Through the model, the relationship between the fault coverage and the dependability of the system is determined. What is more, the dependability of the system can be dynamically and precisely predicted at any given time with the fault coverage set. This will be of much benefit for the dependability evaluation and improvement, and be helpful for the system design and maintenance.  相似文献   

15.
Complex engineering systems, such as aircraft, industrial processes, and transportation systems, are experiencing a paradigm shift in the way they are operated and maintained. Instead of traditional scheduled or breakdown maintenance practices, they are maintained on the basis of their current state/condition. Condition-Based Maintenance (CBM) is becoming the preferred practice since it improves significantly the reliability, safety and availability of these critical systems. CBM enabling technologies include sensing and monitoring, information processing, fault diagnosis and failure prognosis algorithms that are capable of detecting accurately and in a timely manner incipient failures and predicting the remaining useful life of failing components. If such technologies are to be implemented on-line and in real-time, it is essential that an integrating system architecture be developed that possesses features of modularity, flexibility and interoperability while exhibiting attributes of computational efficiency for both on-line and off-line applications. This paper presents a .NET framework as the integrating software platform linking all constituent modules of the fault diagnosis and failure prognosis architecture. The inherent characteristics of the .NET framework provide the proposed system with a generic architecture for fault diagnosis and failure prognosis for a variety of applications. Functioning as data processing, feature extraction, fault diagnosis and failure prognosis, the corresponding modules in the system are built as .NET components that are developed separately and independently in any of the .NET languages. With the use of Bayesian estimation theory, a generic particle-filtering-based framework is integrated in the system for fault diagnosis and failure prognosis. The system is tested in two different applications—bearing spalling fault diagnosis and failure prognosis and brushless DC motor turn-to-turn winding fault diagnosis. The results suggest that the system is capable of meeting performance requirements specified by both the developer and the user for a variety of engineering systems.  相似文献   

16.
FTT—1:一个基于硬件的故障注入器的设计与实现   总被引:3,自引:0,他引:3  
故障注入是评价计算机系统可信性的一种重要的试验方法。构造故障注入器是故障注入研究中的一人重要组成部分。此文介绍了FTT-1,一个基于硬件的故障注入器的设计与实现。文章讨论了设计与实现硬件故障注入器的关键技术,并介绍了在FTT-1的实现中解决这些关键技术的方法。试验结果证明了FTT-1用于评价容错计算机系统可信性的有效性。  相似文献   

17.
Complex real-time system design needs to address dependability requirements, such as safety, reliability, and security. We introduce a modelling and simulation based approach which allows for the analysis and prediction of dependability constraints. Dependability can be improved by making use of fault tolerance techniques. The de-facto example, in the real-time system literature, of a pump control system in a mining environment is used to demonstrate our model-based approach. In particular, the system is modelled using the Discrete EVent system Specification (DEVS) formalism, and then extended to incorporate fault tolerance mechanisms. The modularity of the DEVS formalism facilitates this extension. The simulation demonstrates that the employed fault tolerance techniques are effective. That is, the system performs satisfactorily despite the presence of faults. This approach also makes it possible to make an informed choice between different fault tolerance techniques. Performance metrics are used to measure the reliability and safety of the system, and to evaluate the dependability achieved by the design. In our model-based development process, modelling, simulation and eventual deployment of the system are seamlessly integrated.  相似文献   

18.
嵌入式故障注入器HFI-4的研究与设计   总被引:3,自引:0,他引:3  
管脚级故障注入是对原型系统进行可信性验证的一种常用的方法。本文提出了一个基于嵌入式注入方式的管脚级故障注入工具HFI-4的设计方案,并重点讨论了设计过程中遇到的方向控制、同步控制、结果回收等几项关键技术的解决方案。  相似文献   

19.
This paper deals with industrial practices and strategies for Fault Tolerant Control (FTC) and Fault Detection and Isolation (FDI) in civil aircraft by focusing mainly on a typical Airbus Electrical Flight Control System (EFCS). This system is designed to meet very stringent requirements in terms of safety, availability and reliability that characterized the system dependability. Fault tolerance is designed into the system by the use of stringent processes and rules, which are summarized in the paper. The strategy for monitoring (fault detection) of the system components, as a part of the design for fault tolerance, is also described in this paper. Real application examples and implementation methodology are outlined. Finally, future trends and challenges are presented.This paper is a full version of the invited plenary talk presented by the author on the 1st July 2009 at the 7th IFAC Symposium Safeprocess '09, Barcelona.  相似文献   

20.
Complex software and network based information server systems may exhibit failures. Quite often, such failures may not be accidental. Instead some failures may be caused by deliberate security intrusions with the intent ranging from simple mischief, theft of confidential information to loss of crucial and possibly life saving services. Not only it is important to prevent and/or tolerate security intrusions, it is equally important to treat security as a QoS attribute at par with other QoS attributes such as availability and performance. This paper deals with various issues related to quantifying the security attributes of an intrusion tolerant system, such as the SITAR system. A security intrusion and the response of an intrusion tolerant system to an attack is modeled as a random process. This facilitates the use of stochastic modeling techniques to capture the attacker behavior as well as the system’s response to a security intrusion. This model is used to analyze and quantify the security attributes of the system. The security quantification analysis is first carried out for steady-state behavior leading to measures like steady-state availability. By transforming this model to a model with absorbing states, we compute a security measure called the “mean time (or effort) to security failure” (MTTSF) and also compute probabilities of security failure due to violations of different security attributes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号