首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Developers of fault-tolerant distributed systems need to guarantee that fault tolerance mechanisms they build are in themselves reliable. Otherwise, these mechanisms might in the end negatively affect overall system dependability, thus defeating the purpose of introducing fault tolerance into the system. To achieve the desired levels of reliability, mechanisms for detecting and handling errors should be developed rigorously or formally. We present an approach to modeling and verifying fault-tolerant distributed systems that use exception handling as the main fault tolerance mechanism. In the proposed approach, a formal model is employed to specify the structure of a system in terms of cooperating participants that handle exceptions in a coordinated manner, and coordinated atomic actions serve as representatives of mechanisms for exception handling in concurrent systems. We validate the approach through two case studies: (i) a system responsible for managing a production cell, and (ii) a medical control system. In both systems, the proposed approach has helped us to uncover design faults in the form of implicit assumptions and omissions in the original specifications.  相似文献   

2.
When building dependable systems by integrating untrusted software components that were not originally designed to interact with each other, it is likely the occurrence of architectural mismatches related to assumptions in their failure behaviour. These mismatches, if not prevented during system design, have to be tolerated during runtime. This paper presents an architectural abstraction based on exception handling for structuring fault-tolerant software systems. This abstraction comprises several components and connectors that promote an existing untrusted software element into an idealised fault-tolerant architectural element. Moreover, it is considered in the context of a rigorous software development approach based on formal methods for representing the structure and behaviour of the software architecture. The proposed approach relies on a formal specification and verification for analysing exception propagation, and verifying important dependability properties, such as deadlock freedom, and scenarios of architectural reconfiguration. The formal models are automatically generated using model transformation from UML diagrams: component diagram representing the system structure, and sequence diagrams representing the system behaviour. Finally, the formal models are also used for generating unit and integration test cases that are used for assessing the correctness of the source code. The feasibility of the proposed architectural approach was evaluated on an embedded critical case study. Patrick Brito is supported by Fapesp/Brazil under Grant No. 06/02116–2 and CAPES/Brazil under Grant No. 0722–07–3. Cecília Rubira is partially supported by CNPq/Brazil under Grant Nos. 301446/2006–7 and 484138/2006–5.  相似文献   

3.
This paper deals with evaluation of the dependability (considered as a generic term, whose main measures are reliability, availability, and maintainability) of software systems during their operational life, in contrast to most of the work performed up to now, devoted mainly to development and validation phases. The failure process due to design faults, and the behavior of a software system up to the first failure and during its life cycle are successively examined. An approximate model is derived which enables one to account for the failures due to the design faults in a simple way when evaluating a system's dependability. This model is then used for evaluating the dependability of 1) a software system tolerating design faults, and 2) a computing system with respect to physical and design faults.  相似文献   

4.
A wide range of commercial consumer devices such as mobile phones and smart televisions rely on embedded systems software to provide their functionality. Testing is one of the most commonly used methods for validating this software, and improved testing approaches could increase these devices’ dependability. In this article we present an approach for performing such testing. Our approach is composed of two techniques. The first technique involves the selection of test data; it utilizes test adequacy criteria that rely on dataflow analysis to distinguish points of interaction between specific layers in embedded systems and between individual software components within those layers, while also tracking interactions between tasks. The second technique involves the observation of failures: it utilizes a family of test oracles that rely on instrumentation to record various aspects of a system's execution behavior, and compare observed behavior to certain intended system properties that can be derived through program analysis. Empirical studies of our approach show that our adequacy criteria can be effective at guiding the creation of test cases that detect faults, and our oracles can help expose faults that cannot easily be found using typical output-based oracles. Moreover, the use of our criteria accentuates the fault-detection effectiveness of our oracles.  相似文献   

5.
The coordinated atomic action concept was proposed as a means for providing fault tolerance in complex objectoriented systems that incorporate both cooperative and competitive concurrency. This paper has two purposes: to discuss a particular implementation of this concept and to address a number of the implementation issues that are common to any experiments with this concept. Our implementation relies on a detailed set of programming conventions for the standard Ada 95 language and uses a scheme of forward error recovery incorporating concurrent exception handling and resolution. Ada 95 has a number of unique features which make it a particularly good choice for our experiments. We believe that our approach is practical and useful for many critical applications with high dependability requirements.  相似文献   

6.
As aspects extend or replace existing functionality at specific join points in the code, their behavior may raise new exceptions, which can flow through the program execution in unexpected ways. Assuring the reliability of exception handling code in aspect-oriented (AO) systems is a challenging task. Testing the exception handling code is inherently difficult, since it is tricky to provoke all exceptions during tests, and the large number of different exceptions that can happen in a system may lead to the test-case explosion problem. Moreover, we have observed that some properties of AO programming (e.g., quantification, obliviousness) may conflict with characteristics of exception handling mechanisms, exacerbating existing problems (e.g., uncaught exceptions). The lack of verification approaches for exception handling code in AO systems stimulated the present work. This work presents a verification approach based on a static analysis tool, called SAFE, to check the reliability of exception handling code in AspectJ programs. We evaluated the effectiveness and feasibility of our approach in two complementary ways (i) by investigating if the SAFE tool is precise enough to uncover exception flow information and (ii) by applying the approach to three medium-sized ApectJ systems from different application domains.  相似文献   

7.
Large-scale distributed applications such as online information retrieval and collaboration over computational elements demand an approach to self-managed computing systems with a minimum of human interference. However, large scales and full distribution often lead to poor system dependability and security, and increase the difficulty in managing and controlling redundancy for fault tolerance. In particular, fault tolerance schemes for mobile agents to survive agent server crash failures in an autonomie environment are complex since developers normally have no control over remote agent servers. Some solutions inject a replica into stable storage upon its arrival at an agent server. But in the event of an agent server crash the replica is unavailable until the agent server recovers. In this paper we present a failure model and an exception handling framework for mobile agent systems. An exception handling scheme is developed for mobile agents to survive agent server crash failures. A replica mobile agent operates at the agent server visited prior to its master's current location. If a master crashes its replica is available as a replacement. The proposed scheme is examined in comparison with a simple time-out scheme. Experimental evaluation is performed, and performance results show that the scheme leads to some overhead in the round trip time when fault tolerance measures are exercised. However the scheme offers the advantage that fault tolerance is provided during the mobile agent trip, i.e. in the event of an agent server crash all agent servers are not revisited.  相似文献   

8.
异常处理技术在C++中的编程实现   总被引:2,自引:0,他引:2  
异常处理是C++语言的重要语言机制,正确地处理异常对程序的可靠性、健壮性是十分重要的.本文回顾了异常处理技术的概念和思想,介绍了C++异常处理技术中涉及到的常见问题,对异常处理的性能与代价进行了分析,以便更好地在面向对象程序设计中正确使用异常处理技术进行编程实现.  相似文献   

9.
异常处理机制能增强程序运行的可靠性,提高软件的健壮性,但异常处理代码本身可能存在错误.由于它的特殊性。采用与测试普通代码同样的方法对其进行测试,通常效率不高而且很难达到预期的效果.在分析了利用断言违背策略进行软件故障注入技术的基础上,提出了将Java异常处理机制的特殊结构同断言违背策略、程序变异技术相结合,可以有效地测试异常处理代码。并设计工具来支持这种故障注入方法.  相似文献   

10.
Clark  J.A. Pradhan  D.K. 《Computer》1995,28(6):47-56
A fault tolerant computer system's dependability must be validated to ensure that its redundancy has been correctly implemented and the system will provide the desired level of reliable service. Fault injection-the deliberate insertion of faults into an operational system to determine its response offers an effective solution to this problem. We survey several fault injection studies and discuss tools such as React (Reliable Architecture Characterization Tool) that facilitate its application  相似文献   

11.
Exception handling enables programmers to specify the behavior of a program when an exceptional event occurs at runtime. Exception handling, thus, facilitates software fault tolerance and the production of reliable and robust software systems. With the recent emergence of multi-processor systems and parallel programming constructs, techniques are needed that provide exception handling support in these environments that are intuitive and easy to use. Unfortunately, extant semantics of exception handling for concurrent settings is significantly more complex to reason about than their serial counterparts.In this paper, we investigate a similarly intuitive semantics for exception handling for the future parallel programming construct in Java. Futures are used by programmers to identify potentially asynchronous computations and to introduce parallelism into sequential programs. The intent of futures is to provide some performance benefits through the use of method-level concurrency while maintaining as-if-serial semantics that novice programmers can easily understand — the semantics of a program with futures is the same as that for an equivalent serial version of the program. We extend this model to provide as-if-serial exception handling semantics. Using this model our runtime delivers exceptions to the same point it would deliver them if the program was executed sequentially. We present the design and implementation of our approach and evaluate its efficiency using an open source Java virtual machine.  相似文献   

12.
In many software systems, properties necessary for dependable operation are only a small subset of all desirable system properties. Assuring properties over the simpler subset can provide assurance of critical properties over the entire system. This work provides a method for constructing systems to be dependably reconfigurable. A system's primary function can have less demanding dependability requirements than the overall system because the system can reconfigure to some simpler function. Reconfiguration thus controls the effective complexity of the system without forcing that system to sacrifice desired, but unassurable, capabilities. Focusing a system's dependability argument on reconfiguration means that reconfiguration must proceed correctly with very high assurance. The system construction approach in this work also provides a method through which system dependability properties can be shown. To illustrate the ideas in this work, we have built part of a hypothetical avionics system that is typical of what might be found on an unmanned aerial vehicle.  相似文献   

13.
The state of art in handling and resolving concurrent exceptions is discussed and a brief outline of all research in this area is given. Our intention is to demonstrate that exception resolution is a very useful concept which facilitates joint forward error recovery in concurrent and distributed systems. To do this, several new arguments are considered. We understand resolution as reaching an agreement among cooperating participants of an atomic action. It is provided by the underlying system to make it unified and less error prone, which is important for forward error recovery, complex by nature. We classify atomic action schemes into asynchronous and synchronous ones and discuss exception handling for schemes of both kinds. The paper also deals with introducing atomic action schemes based on exception resolution into existing concurrent and distributed languages, which usually have only local exceptions. We outline the basic approach and demonstrate its applicability by showing how exception resolution can be used in Ada 83, Ada 95 (for both concurrent and distributed systems) and Java. A discussion of ways to make this concept more object-oriented and, with the help of reflection, more flexible and useful, concludes the paper.  相似文献   

14.

Observing and controlling the dependability of service provision of complex IoT systems is challenging. In practice, many organizations struggle to derive consumer needs related to quality and to observe and quantify the service provision in the context of the dynamic behavior of a complex distributed system. In this paper, we present an approach to define and evaluate the dependability of complex IoT systems. Our approach is an adaptation of the ISO/IEC 25040, an international standard for the evaluation process for system and software quality, which is part of the systems and software quality requirements and evaluation (SQuaRE) series. Our approach was designed and evaluated with action research in an industrial study at Robert Bosch GmbH. Based on the framework of the SQuaRE series, we integrated different elements of site reliability engineering (SRE) and combined them with distributed tracing as a promising measurement method. Our approach introduces the IoT transaction concept to reduce modeling and observation efforts while increasing operationalization to measure performance against dependability targets. Our adaption was effectively applied, consumer-centricity along different system stakeholders were enhanced, and negative consequences of organizational silos were reduced. This has improved the dependability evaluation of service provision to enable fast feedback cycles for service performance control and improvement.

  相似文献   

15.
Fault tolerance in computerized systems involved in production has become an ever more important requirement. Existing fault tolerance approaches, wherever used, deal mainly with hardware faults. Nevertheless, the vast majority of contemporary system failures are software related. This paper introduces a knowledge-based approach to handling software related faults occurring in supervisory control systems. These systems are event driven and use data, stored in complex databases, to react to events coming from different kinds of devices by identifying, scheduling, initiating and monitoring operations. Failure of part of the supervisory control system's software to behave rationally when unexpected events occur is called an application fault. The approach introduced in this paper is based on a supervisory control system reference model which reveals the set of all possible application faults together with the major functions of the recovery processes associated with each fault, and leads to a high-level knowledge-based system architecture capable of handling every fault-related condition. This system is called PROFIT (Intelligent PROduction systems Fault Tolerance) and consists of three main components: the fault diagnosis module, the instant fault correction module and the learning module, co-ordinated by a PROFIT meta-level module. The prototype version of PROFIT is analysed and the development as well as the run-time environment that prove the applicability and effectiveness of the system are presented.  相似文献   

16.
A Real-Time Architectural Specification (RAS) approach and its application to command and control (C2) systems are presented. Our objective is to establish a formal foundation that will enable us to integrate existing rich but fragmented formal techniques for system specification and verification into a practical and scaleable formal engineering method to support the design and development of highly reliable real-time distributed systems. The contribution of RAS is twofold: First, it provides a formal system that integrates system's timing requirements and requirements propagation into the process of architectural modeling and design in such a way that allows us to systematically enforce that the requirements are met in every step of the design process. Second, it offers an incremental and more scaleable approach for design modeling. These two features together make RAS a suitable model for the design of C2 systems. We further present an incremental method for verifying timing properties of an RAS model that helps to reduce the complexity of analysis both at a given design level and across different design levels. This revised version was published online in June 2006 with corrections to the Cover Date.  相似文献   

17.
Exception handling is widely regarded as a necessity in programming languages today and almost every programming language currently used for professional software development supports some form of it. However, spreadsheet systems, which may be the most widely used type of “programming language” today in terms of number of users using it to create “programs” (spreadsheets), have traditionally had only extremely limited support for exception handling. Spreadsheet system users range from end users to professional programmers and this wide range suggests that an approach to exception handling for spreadsheet systems needs to be compatible with the equational reasoning model of spreadsheet formulas, yet feature expressive power comparable to that found in other programming languages. We present an approach to exception handling for spreadsheet system users that is aimed at this goal. Some of the features of the approach are new; others are not new, but their effects on the programming language properties of spreadsheet systems have not been discussed before in the literature. We explore these properties, offer our solutions to problems that arise with these properties, and compare the functionality of the approach with that of exception handling approaches in other languages  相似文献   

18.
工作流管理系统异常处理的方法与层次   总被引:1,自引:0,他引:1  
工作流技术在信息处理领域的应用越来越受到重视,但应用中环境和用户要求的不断发展和变化需要工作流管理系统具有灵活的处理能力,工作流系统的异常处理正是要解决这种不断要求的变化。文章介绍了工作流异常处理的应用范围,总结了不同的应用方法,从系统的角度提出了工作流未来的异常处理层次,并在研究可适应性工作流技术方面进行了探讨。  相似文献   

19.
Workflow management systems (WfMSs) are being increasingly deployed to deliver e-business transactions across organizational boundaries. To ensure a high service quality in such transactions, exception-handling schemes for conflict resolution are needed. The conflicts primarily arise due to failure of a task in workflow execution because of underlying application, or controlling WfMS component failures or insufficient user input. So far, little progress has been reported in addressing conflict resolution in cross-organizational business processes, though its importance has been recognized. In this paper, we identify the exception handling techniques that support conflict resolution in cross-organizational settings. In particular, we propose a novel, bundled exception-handling approach, which supports (1) exception knowledge sharing--sharing exception specifications and handling experiences, (2) coordinated exception handling, and (3) intelligent problem solving--using case based reasoning to reuse exception handing experiences. A prototype of this exception handling mechanism is developed and integrated as a part of the METEOR Workflow Management System. An evaluation of our approach is also presented through some sample workflow applications.  相似文献   

20.
Exception handling design can improve robustness, which is an important quality attribute of software. However, exception handling design remains one of the less understood and considered parts in software development. In addition, like most software design problems, even if developers are requested to design with exception handling beforehand, it is very difficult to get the right design at the first shot. Therefore, improving exception handling design after software is constructed is necessary. This paper applies refactoring to incrementally improve exception handling design. We first establish four exception handling goals to stage the refactoring actions. Next, we introduce exception handling smells that hinder the achievement of the goals and propose exception handling refactorings to eliminate the smells. We suggest exception handling refactoring is best driven by bug fixing because it provides measurable quality improvement results that explicitly reveal the benefits of refactoring. We conduct a case study with the proposed refactorings on a real world banking application and provide a cost-effectiveness analysis. The result shows that our approach can effectively improve exception handling design, enhance software robustness, and save maintenance cost. Our approach simplifies the process of applying big exception handling refactoring by dividing the process into clearly defined intermediate milestones that are easily exercised and verified. The approach can be applied in general software development and in legacy system maintenance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号