首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
The benefits of the analysis of software faults and failures have been widely recognized. However, detailed studies based on empirical data are rare. In this paper, we analyze the fault and failure data from two large, real-world case studies. Specifically, we explore: 1) the localization of faults that lead to individual software failures and 2) the distribution of different types of software faults. Our results show that individual failures are often caused by multiple faults spread throughout the system. This observation is important since it does not support several heuristics and assumptions used in the past. In addition, it clearly indicates that finding and fixing faults that lead to such software failures in large, complex systems are often difficult and challenging tasks despite the advances in software development. Our results also show that requirement faults, coding faults, and data problems are the three most common types of software faults. Furthermore, these results show that contrary to the popular belief, a significant percentage of failures are linked to late life cycle activities. Another important aspect of our work is that we conduct intra- and interproject comparisons, as well as comparisons with the findings from related studies. The consistency of several main trends across software systems in this paper and several related research efforts suggests that these trends are likely to be intrinsic characteristics of software faults and failures rather than project specific.  相似文献   

2.
A wide range of commercial consumer devices such as mobile phones and smart televisions rely on embedded systems software to provide their functionality. Testing is one of the most commonly used methods for validating this software, and improved testing approaches could increase these devices’ dependability. In this article we present an approach for performing such testing. Our approach is composed of two techniques. The first technique involves the selection of test data; it utilizes test adequacy criteria that rely on dataflow analysis to distinguish points of interaction between specific layers in embedded systems and between individual software components within those layers, while also tracking interactions between tasks. The second technique involves the observation of failures: it utilizes a family of test oracles that rely on instrumentation to record various aspects of a system's execution behavior, and compare observed behavior to certain intended system properties that can be derived through program analysis. Empirical studies of our approach show that our adequacy criteria can be effective at guiding the creation of test cases that detect faults, and our oracles can help expose faults that cannot easily be found using typical output-based oracles. Moreover, the use of our criteria accentuates the fault-detection effectiveness of our oracles.  相似文献   

3.
Fault-tolerant grid architecture and practice   总被引:10,自引:0,他引:10       下载免费PDF全文
Grid computing emerges as effective technologies to couple geographically dis-tributed resources and solve large-scale computational problems in wide area networks. The fault tolerance is a significant and complex issue in grid computing systems. Various techniques have been investigated to detect and correct faults in distributed computing systems. Unreliable fault detection is one of the most effective techniques. Globus as a grid middleware manages resources in a wide area network. The Globns fault detection service uses the well-known techniques basedon unreliable fault detectors to detect and report component failures. However, more powerful techniques are required to detect and correct both system-level and application-level faults in agrid system, and a convenient toolkit is also needed to maintain the consistency in the grid. Afault-tolerant grid platform (FTGP) based on an unreliable fault detector and the Globus faultdetection service is presented in this paper. The platform offers effective strategies in such threeaspects as grid key components, user tasks, and high-level applications.  相似文献   

4.
Nowadays, Convolutional Neural Networks (CNNs) are widely used as prediction models in different fields, with intensive use in real-time safety-critical systems. Recent studies have demonstrated that hardware faults induced by an external perturbation or aging effects, may significantly impact the CNN inference, leading to prediction failures. Therefore, ensuring the reliability of CNN platforms is crucial, especially when deployed in critical applications. A lot of effort has been made to reduce the memory and energy footprint of CNNs, paving the way to the adoption of approximate computing techniques such as quantization, reduced precision, weight sharing, and pruning. Unfortunately, approximate computing reduces the intrinsic redundancy of CNNs making them more efficient but less resilient to hardware faults. The goal of this work is twofold. First, we assess the reliability of a CNN when reduced bit widths and two different data types (floating- and fixed-point) are used to represent the network parameters (i.e., synaptic weights). Second, we intend to investigate the best compromise between data type, bit-widths reduction, and reliability. The characterization is performed through a fault injection environment built on the darknet open-source framework and targets two CNNs: LeNet-5 and YOLO. Experimental results show that fixed-point data provide the best trade-off between memory footprint reduction and CNN resilience. In particular, for LeNet-5, we achieved a 4X memory footprint reduction at the cost of a slightly reduced reliability (0.45% of critical faults) without retraining the CNN.  相似文献   

5.
6.
Nowadays, FPGA-based Networked Control Systems (NCSs) are frequently used. Transient and permanent faults occur often as a result of radiation in industrial environments. Accordingly, Fault-Tolerant (FT) FPGA-based NCSs are desired. In this paper, a novel NCS model is proposed composing of In-Loop and S2A architectures linked via an Ethernet switch. This architecture is used in shape detection machines with vision sensing requirements. FT techniques are applied in the controller nodes of the system along with Dynamic Partial Reconfiguration (DPR) for FPGA-based controller recovery. The reliability of the system due to changes in both the recovery rate and the conditional probability of failure occurrence (either transient or permanent), is presented in this paper. Accordingly, a Markov model is constructed for reliability calculations. A case study is used to illustrate the use of such a model to choose appropriate maintenance strategies as well as a quantitative measure for the ability of the FT techniques to increase system reliability. Coverage is then studied in the context of the same system. Furthermore, system failures are divided into safe system failures and unsafe system failures. Another Markov model is developed. Then, a case study is used to illustrate the effect of coverage on the probability of occurrence of an unsafe system failure.  相似文献   

7.
Adaptive compensation for infinite number of actuator failures or faults   总被引:1,自引:0,他引:1  
It is both theoretically and practically important to investigate the problem of accommodating infinite number of actuator failures or faults in controlling uncertain systems. However, there is still no result available in developing adaptive controllers to address this problem. In this paper, a new adaptive failure/fault compensation control scheme is proposed for parametric strict feedback nonlinear systems. The techniques of nonlinear damping and parameter projection are employed in the design of controllers and parameter estimators, respectively. It is proved that the boundedness of all closed-loop signals can still be ensured in the case with infinite number of failures or faults, provided that the time interval between two successive changes of failure/fault pattern is bounded below by an arbitrary positive number. The performance of the tracking error in the mean square sense with respect to the frequency of failure/fault pattern changes is also established. Moreover, asymptotic tracking can be achieved when the total number of failures and faults is finite.  相似文献   

8.
The inherent complexity of modern cloud infrastructures has created the need for innovative monitoring approaches, as state-of-the-art solutions used for other large-scale environments do not address specific cloud features. Although cloud monitoring is nowadays an active research field, a comprehensive study covering all its aspects has not been presented yet. This paper provides a deep insight into cloud monitoring. It proposes a unified cloud monitoring taxonomy, based on which it defines a layered cloud monitoring architecture. To illustrate it, we have implemented GMonE, a general-purpose cloud monitoring tool which covers all aspects of cloud monitoring by specifically addressing the needs of modern cloud infrastructures. Furthermore, we have evaluated the performance, scalability and overhead of GMonE with Yahoo Cloud Serving Benchmark (YCSB), by using the OpenNebula cloud middleware on the Grid’5000 experimental testbed. The results of this evaluation demonstrate the benefits of our approach, surpassing the monitoring performance and capabilities of cloud monitoring alternatives such as those present in state-of-the-art systems such as Amazon EC2 and OpenNebula.  相似文献   

9.
Although the idea of making technology more context aware is an alluring one, this seemingly simple move hides a great deal of complexity. Even simple examples such as a context sensitive mobile phone which knows when not to ring, are unlikely to be successful. Any context sensitive technology is likely to make mistakes – like ringing in the middle of a film, or not ringing for an urgent call. Using three examples from fieldwork of alerting systems (two ringing phones and one medical alarm in a hospital), we suggest three guidelines for context systems which could genuinely assist users. First, we argue that context sensitive computing should be used defensively, where incorrect behaviour is tolerable. Second, that technology can provide structures to which people themselves can add context. Third, that technology can communicate context to users, allowing users to make sense of that contextual information themselves. Lastly we argue for an understanding of the long term use of technology use, dwelling with technology, a process which changes how the world is seen and experienced.  相似文献   

10.
Detecting, locating and repairing faults is a hard task. This holds especially in cases where dependent failures occur in practice. In this paper we present a methodology which is capable of handling dependent failures. For this purpose we extend the model-based diagnosis approach by explicitely representing knowledge about such dependencies which are stored in a failure dependency graph. Beside the theoretical foundations we present algorithms for computing diagnoses and repair actions that are based on these extensions. Moreover, we introduce a case study which makes use of a larger control program of an autonomous and mobile robot. The case study shows that the proposed approach can be effectively used in practice.  相似文献   

11.
Integrated architecture for industrial robot programming and control   总被引:6,自引:0,他引:6  
As robot control systems are traditionally closed, it is difficult to add supplementary intelligence. Accordingly, as based on a new notion of user views, a layered system architecture is proposed. Bearing in mind such industrial demands as computing efficiency and simple factory-floor operation, the control layers are parameterized by means of functional operators consisting of pieces of compiled code that can be passed as parameters between the layers. The required interplay between application-specific programs and built-in motion control is thereby efficiently accomplished. The results from experimental evaluation and several case studies suggest the architecture to be very useful also in an industrial context.  相似文献   

12.
13.
One of the key requirements in many multi-agent teams is that agents coordinate specific aspects of their joint task. Unfortunately, this coordination may fail due to intermittent faults in sensor readings, communication faults, etc. A key challenge in the model-based diagnosis (MBD) of coordination failures is to represent a model of the coordination among the agents in a way that allows efficient detection and diagnosis, based on observation of the agents involved. Previously developed mechanisms are useful only for small groups of agents, since they represent the coordination with binary constraints. This paper presents a MBD approach to coordination failures in which non-binary constraints are allowed. This model has two inherent advantages: (1) the model enables to address real problems, (2) the model enables to address large groups by gathering multiple coordinations in one constraint. To solve the diagnosis problem, we propose a matrix-based approach to represent the basic building blocks of the MBD formalization. Theoretical and empirical evaluations show that this representation is efficient for large-scale teams.  相似文献   

14.
Alex Dvinsky  Roy Friedman 《Software》2015,45(10):1429-1455
This paper reports about our experience in designing and developing Chameleon, a highly portable and adaptable group communication framework for smartphones. Chameleon owes its level of portability to several design choices, including the following: (i) a layered architecture, where the headers of each layer have a standard XML‐based format, enabling automatic, error‐resistant generation of efficient serialization code in any platform; (ii) reliance only on the J2ME library, which serves as least common denominator for Java dialects and facilitates automatic translation to.NET; (iii) having flexible membership models; and (iv) supporting multiple concurrent protocol stacks.Through a single codebase, Chameleon is currently available as an open‐source project for J2ME, J2SE, Android,.NET CF, and.NET. Chameleon is easily extendable and is bundled with tools, configurations, and third‐party code tuned in a way that lifts some of the burden normally associated with multiplatform development for smartphones. Both the header generation from XML and automatic translation to.NET features of Chameleon are readily available to any application that is based on it. Chameleon's threading model separates between execution of internal layers and application's code and by that protects one from the other. As we describe in the paper, it simplifies layers' development and allows the protocol stack to easily block application calls when this is required by internal algorithms. Additionally, this model simplifies testing, and an extensive testing framework is supplied along with Chameleon, which is also usable for testing of application‐specific layers. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

15.
Teamwork requires that team members coordinate their actions. The representation of the coordination is a key requirement since it influences the complexity and flexibility of reasoning team-members. One aspect of this requirement is detecting coordination faults as a result of intermittent failures of sensors, communication failures, etc. Detection of such faults, based on observations of the behavior of agents, is of prime importance. Though different solutions have been presented thus far, none has presented a comprehensive and efficient resolution for large-scale teams. This paper presents a formal approach to representing multi-agent coordination, and multi-agent observations, using matrix structures. This representation facilitates easy representation of coordination requirements, modularity, flexibility and reuse of existing systems. Based on this representation, we present a novel solution for fault-detection that is both generic and efficient for large-scale teams. We demonstrate the modularity of the representation by presenting a reuse of existing systems and by importing other models (e.g. hierarchical systems) into the new representation. Finally, we extend the representation to support dynamical aspects of complex systems.  相似文献   

16.
Institutional authority is a factor that impacts adoption of IT. Institutional theory incorporates three different but complimentary perspectives and we used these to develop a layered analysis of IT adoption in organizations. We used a case study of State Government agencies in Australia to show how layers of authority influenced the adoption or rejection of technology and that such forces varied in their influence over time. Based on this, we proposed the notion of patterns of conformity and non-conformity which recognise the changes in levels of compliance over time as organizational forces arise. In particular, the alignment of layers of authority acts to ensure conformity with or rejection of IT adoption decisions.  相似文献   

17.
Sporadic operations such as rolling upgrade or machine instance redeployment are prone to unpredictable failures in the public cloud largely because of the inherent high variability nature of public cloud. Previous dependability research has established several recovery methods for cloud failures. In this paper, we first propose eight recovery patterns for sporadic operations on public cloud. We then present the filtering process which filters applicable recovery patterns. We propose an automation mechanism to automatically generate recovery actions for those applicable recovery patterns based on our resource state transition algorithm. We also propose a methodology to evaluate the recovery actions generated for the applicable recovery patterns based on the recovery evaluation metrics of Recovery Time, Recovery Cost, and Recovery Impact. This quantitative evaluation will lead to selection of the acceptable recovery actions. We propose two recovery actions selection mechanisms: one is based on user constraints of the recovery evaluation metrics, and the other one is based on Pareto set searching algorithm. We implement a recovery service and illustrate its applicability by recovering from errors occurring in the rolling upgrade operation on AWS cloud.  相似文献   

18.
Software health management (SWHM) is an emerging field which addresses the critical need to detect, diagnose, predict, and mitigate adverse events due to software faults and failures. These faults could arise for numerous reasons including coding errors, unanticipated faults or failures in hardware, or problematic interactions with the external environment. This paper demonstrates a novel approach to software health management based on a rigorous Bayesian formulation that monitors the behavior of software and operating system, performs probabilistic diagnosis, and provides information about the most likely root causes of a failure or software problem. Translation of the Bayesian network model into an efficient data structure, an arithmetic circuit, makes it possible to perform SWHM on resource-restricted embedded computing platforms as found in aircraft, unmanned aircraft, or satellites. SWHM is especially important for safety critical systems such as aircraft control systems. In this paper, we demonstrate our Bayesian SWHM system on three realistic scenarios from an aircraft control system: (1) aircraft file-system based faults, (2) signal handling faults, and (3) navigation faults due to inertial measurement unit (IMU) failure or compromised Global Positioning System (GPS) integrity. We show that the method successfully detects and diagnoses faults in these scenarios. We also discuss the importance of verification and validation of SWHM systems.  相似文献   

19.
普适计算环境下基于中间件的上下文质量管理框架研究   总被引:1,自引:0,他引:1  
郑笛  王俊  贲可荣 《计算机科学》2011,38(11):127-130
随着信息技术的快速发展,分布式计算技术逐渐向普适计算技术演化,从而达到信息空间和物理空间融合的最终目标,为用户提供普适的智能化服务。为了达到这个目标,一个主要的困难就是如何有效地连续监测、才甫获与解释环境相关的上下文信息来确保精确的上下文感知性。很多研究者已先后投身于上下文感知的普适应用的研究工作中,但大多数往往直接针对原始上下文进行处理,没有考虑上下文质量(QoC)的影响。因此,提出了一种基于中间件的上下文质量管理框架,即通过上下文的质量门阂管理、重复与不一致的上下文丢弃等不同层次的控制机制,为上下文感知服务和应用用户提供有效而可靠的上下文服务。  相似文献   

20.
Massimo Ficco  Stefano Russo 《Software》2009,39(13):1095-1125
Location‐aware computing is a form of context‐aware mobile computing that refers to the ability of providing users with services that depend on their position. Locating the user terminal, often called positioning, is essential in this form of computing. Towards this aim, several technologies exist, ranging from personal area networking, to indoor, outdoor, and up to geographic area systems. Developers of location‐aware software applications have to face with a number of design choices, that typically depend on the chosen technology. This work addresses the problem of easing the development of pull location‐aware applications, by allowing uniform access to multiple heterogeneous positioning systems. Towards this aim, the paper proposes an approach to structure location‐aware mobile computing systems in a way independent of positioning technologies. The approach consists in structuring the system into a layered architecture, that provides application developers with a standard Java Application Programming Interface (JSR‐179 API), and encapsulates location data management and technology‐specific positioning subsystems into lower layers with clear interfaces. In order to demonstrate the proposed approach we present the development of HyLocSys. It is an open hybrid software architecture designed to support indoor/outdoor applications, which allows the uniform (combined or separate) use of several positioning technologies. HyLocSys uses a hybrid data model, which allows the integration of different location information representations (using symbolic and geometric coordinates). Moreover, it allows support to handset‐ and infrastructure‐based positioning approaches while respecting the privacy of the user. The paper presents a prototypal implementation of HyLocSys for heterogeneous scenarios. It has been implemented and tested on several platforms and mobile devices. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号