期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Optimal tracing and replay for debugging message-passing parallel programs

Robert H. B. Netzer Barton P. Miller 《The Journal of supercomputing》1995,8(4):371-388

A common debugging strategy involves reexecuting a program (on a given input) over and over, each time gaining more information about bugs. Such techniques can fail on message-passing parallel programs. Because of nondeterminacy, different runs on the given input may produce different results. This nonrepeatability is a serious debugging problem, since an execution cannot always be reproduced to track down bugs. This paper presents a technique for tracing and replaying message-passing programs. By tracing the order in which messages are delivered, a reexecution can be forced to deliver messages in their original order, reproducing the original execution. To reduce the overhead of such a scheme, we show that the delivery'order of only messages involved inraces need be traced (and not every message). Our technique makes run-time decisions to detect and trace racing messages and is usuallyoptimal in the sense that the minimal number of racing messages is traced. Experiments indicate that only 1% of the messages are often traced, gaining a reduction of two orders of magnitude over traditional techniques that trace every message. These traces allow an execution to be reproduced any number of times for debugging. Our work is novel in that we adaptively decide what to trace, and trace only those messages that introduce nondeterminacy. With our strategy, large reductions in trace size allow long-running programs to be replayed that were previously unmanageable. In addition, the reduced tracing requirements alleviate tracing bottle-necks, allowing executions to be debugged with substantially lower execution time overhead.This work was supported in part by National Science Foundation grants CCR-8815928 and CCR-9100968, Office of Naval Research grant N00014-89-J-1222, and a grant from Sequent Computer Systems, Inc. 相似文献

2.

Interrupt replay: a debugging method for parallel programs with interrupts

KMR Audenaert LJ Levrouw 《Microprocessors and Microsystems》1994,18(10):601-612

The behaviour of programs for multiprocessors may be indeterminate, due to processor timing variations. This poses a problem for cyclic debugging, since a bug may disappear from one execution to another. Replay is an elegant solution to this problem, in which ‘sufficient’ information is recorded in a log. This information is then used to control subsequent executions of the same program so that repeatability is guaranteed. Interrupts are another source of non-determinism, even in sequential programs. This paper presents an extension of the well-known Instant Replay method, termed Interrupt Replay, for replaying programs in the presence of interrupts. The correctness of Interrupt Replay is based on the assumption that there are no interrupt races: an interrupt service routine must not access data that is also accessed by the foreground process whenever the interrupt is enabled. If such races are present then replay may fail to produce deterministic results. This assumption is similar to the basic assumption of Instant Replay that shared variables are properly protected by mutual exclusion. Also as in Instant Replay, it is assumed that the behaviour of the environment (input data, external interrupts) is replayed by some other tracing mechanism. 相似文献

3.

基于回放机制的并发程序中的错误重现方法

罗清宙《计算机工程与设计》2010,31(13)

为了帮助程序员检测并发程序中的错误,提出了一种重现错并发误场景的方法.使用对Java字节码插装的方法,生成记录和回放版本的程序.在记录程序运行时,自动记录下线程间执行的逻辑循序,同时在程序发生崩溃后自动生成测试用例,这些测试用例运行在回放版本上具有确定性地重现并发错误的能力.实现了一个基于Java程序的原型工具,实验结果表明,该原型具有重现并发程序错误的能力,并且性能开销在一个可接受的范围内. 相似文献

4.

A formal semantics for debugging synchronous message passing-based concurrent programs

He?Li Email author Jie?Luo Wei?Li 《中国科学:信息科学(英文版)》2014,57(12):1-13

During a human-exoskeleton collaboration, the interaction torque on exoskeleton resulting from the human cannot be clearly determined and conducted by normal physical models. This is because the torque depends not only on direction and orientation of both human-operator and exoskeleton but also on the physical properties of each operator. In this paper, we present our investigations on the relationship between the interaction torques with the dynamic factors of the human-exoskeleton systems using state-of-the-art learning techniques （nonparametric regression techniques） and provide control applications based on the findings. Exper- imental data was collected from various human-operators when they were attached to the designed exoskeleton to perform unconstraint motions with and without control. The results showed that regardless of how the ex- periments were done and which learning method was chosen, the resulting interaction could be best represented by time varying non-linear mappings of the operator＇s angular position, and the exoskeleton＇s angular position, velocity, and acceleration during locomotion. This finding has been applied to advanced controls of the lower exoskeletal robots in order to improve their performance while interacting with human. 相似文献

5.

A formal semantics for debugging synchronous message passing-based concurrent programs

LI He LUO Jie LI Wei 《中国科学:信息科学(英文版)》2014,(12):194-211

In this paper, we propose a semantic framework to debug synchronous message passing-based con- current programs, which are increasingly useful as parallel computing and distributed systems become more and more pervasive. We first design a concurrent programming language model to uniformly represent exist- ing concurrent programming languages. Compared to sequential programming languages, this model contains communication statements, i.e., sending and receiving statements, and a concurrent structure to represent com- munication and concurrency. We then propose a debugging process consisting of a tracing and a locating procedure. The tracing procedure re-executes a program with a failed test case and uses specially designed data structures to collect useful execution information for locating bugs. We provide for the tracing procedure a struc- tural operational semantics to represent synchronous communication and concurrency. The locating procedure backward locates the ill-designed statement by using information obtained in the tracing procedure, generates a fix equation, and tries to fix the bug by solving the fix equation. We also propose a structural operational semantics for the locating procedure. We supply two examples to test our proposed operational semantics. 相似文献

6.

Validating a demonstration tool for graphics-assisted debugging ofAda concurrent programs

Feldman M.B. Moran M.L. 《IEEE transactions on pattern analysis and machine intelligence》1989,15(3):305-313

A demonstration-quality graphics-assisted debugger is developed for intertask communication in Ada. Based on the static task-specification diagrams of G. Booch (Software Engineering with Ada, Benjamin/Cummings, 1983), the debugger animates the activity of a collection of communicating tasks, and it runs on a DEC GIGI terminal connected to a VAX 11-780 under TeleSoft's partial Ada compiler. The model has been subjected to empirical validation, using undergraduate students as experimental subjects. Subjects were required to debug erroneous tasking programs using both the graphical debugger and a textual one. It is concluded that although the problems to be addressed in the development and evaluation of a graphical debugging tool for Ada tasks are nontrivial, the benefits could be worth the effort 相似文献

7.

The ‘Hoare logic’ of concurrent programs

Leslie Lamport 《Acta Informatica》1980,14(1):21-37

Summary Hoare's logical system for specifying and proving partial correctness properties of sequential programs is generalized to concurrent programs. The basic idea is to define the assertion {P} S {Q} to mean that if execution is begun anywhere in S with P true, then P will remain true until S terminates, and Q will be true if and when S terminates. The predicates P and Q may depend upon program control locations as well as upon the values of variables. A system of inference rules and axiom schemas is given, and a formal correctness proof for a simple program is outlined. We show that by specifying certain requirements for the unimplemented parts, correctness properties can be proved without completely implementing the program. The relation to Pnueli's temporal logic formalism is also discussed. 相似文献

8.

A class library for implementing, testing, and debugging concurrent programs

Richard H. Carver Yu Lei 《International Journal on Software Tools for Technology Transfer (STTT)》2010,12(1):69-88

We describe the Modern Multithreading (MM) class library. MM is a class library consisting of thread and synchronization classes that provide significant support for testing and debugging multithreaded programs. The synchronization classes implement commonly used synchronization objects such as semaphores, monitors, and asynchronous and synchronous message passing channels, for programs that run on a single computer or on a distributed system. MM uses controlled executions to provide program tracing and replay and to support a number of implementation-based and specification-based testing techniques, including non-deterministic and deterministic testing and several forms of reachability testing. MM is portable and easy to use, and has been implemented in Java and C++, with C++ versions for the POSIX Pthreads library and for the Windows Win32 API. 相似文献

9.

Data race avoidance and replay scheme for developing and debugging parallel programs on distributed shared memory systems

Yung-Chang Chiu Tyng-Yeu Liang 《Parallel Computing》2011,37(1):11-25

Distributed shared memory (DSM) allows parallel programs to run on distributed computers by simulating a global virtual shared memory, but data racing bugs may easily occur when the threads of a multi-threaded process concurrently access the physically distributed memory. Earlier tools to help programmers locate data racing bugs in non-DSM parallel programs are not easily applied to DSM systems. This study presents the data race avoidance and replay scheme (DRARS) to assist debugging parallel programs on DSM or multi-core systems. DRARS is a novel tool which controls the consistency protocol of the target program, automatically preventing a large class of data racing bugs when the parallel program is subsequently run, obviating much of the need for manual debugging. For data racing bugs that cannot be avoided automatically, DRARS performs a deterministic replay-type function on DSM systems, faithfully reproducing the behavior of the parallel program during run time. Because one class of data racing bugs has already been eliminated, the remaining manual debugging task is greatly simplified. Unlike previous debugging methods, DRARS does not require that the parallel program be written in a specific style or programming language. Moreover, DRARS can be implemented in most consistency protocols. In this paper, DRARS is realized and verified in real experiments using the eager release consistency protocol on a DSM system with various applications. 相似文献

10.

Representing and reasoning about concurrent actions with abductive logic programs

Renwei Li Luís Moniz Pereira 《Annals of Mathematics and Artificial Intelligence》1997,21(2-4):245-303

相似文献

11.

Execution replay of parallel procedural programs

《Journal of Systems Architecture》2000,46(10):835-849

This article describes an execution model for the parallel procedural programming paradigm, which combines multithreading and communications. The model is used to prove sufficient conditions to guarantee the equivalence between two executions of the same program. An efficient mechanism for recording and replaying deterministically parallel procedural programs is derived from the model and implemented in a prototype. Performed on the prototype, systematic measurements of the time overhead of recording traces for replaying various program models indicate that this overhead remains very low. 相似文献

12.

A confluent semantic basis for the analysis of concurrent constraint logic programs

《The Journal of Logic Programming》1997,30(1):53-81

The standard operational semantics of concurrent constraint logic languages is not confluent in the sense that different schedulings of processes may result in different program behaviors. While implementations are free to choose specific scheduling policies, analyses should be correct for all implementations. Moreover, in the presence of parallelism, it is usually not possible to determine how processes will actually be scheduled. Efficient program analysis is therefore difficult as all process schedulings must be considered. To overcome this problem, we introduce a confluent semantics which closely approximates the standard (nonconfluent) semantics. This semantics provides a basis for efficient and accurate program analysis for these languages. To illustrate the usefulness of this approach, we sketch analyses based on abstract interpretations of the confluent semantics which determine if a program is suspension- and local suspension-free. 相似文献

13.

Replay-based debugging of occam programs

A. Cimitile U. De Carlini U. Villano 《Software Testing, Verification and Reliability》1993,3(2):83-100

Parallel programs are intrinsically non-deterministic, and therefore the techniques of cyclical debugging that are commonly used for sequential programs are not suitable for parallel ones. This paper proposes a method to reproduce Occam program behaviour. Saving information on the timer values input by the program and the guards selected at run-time on alternative commands allows program replay, i.e. it makes it possible to re-execute the program deterministically with the same inputs following the same instruction path. This enables the software developer to use tools such as debuggers and intrusive monitors to help identify program faults. After discussing possible implementations of the proposed technique, IRD (an interactive replay debugger for Occam programs) is described. Finally, the use of the IRD in a sample debug session is presented as an example. 相似文献

14.

A noninterference monitoring and replay mechanism for real-timesoftware testing and debugging

Tsai J.J.P. Fang K.-Y. Chen H.-Y. Bi Y.-D. 《IEEE transactions on pattern analysis and machine intelligence》1990,16(8):897-916

A noninterference monitoring and replay mechanism using the recorded execution history of a program to control the replay of the program behavior and guarantee the reproduction of its errors is presented. Based on this approach, a noninterference monitoring architecture has been developed to collect the program execution data of a target real-time software system without affecting its execution. A replay mechanism designed to control the reproduction of the program behavior as well as the examination of the states of the target system and its behavior is presented. The monitoring system has been implemented using a Motorola 68000 computer in a Unix system environment. An example is used to illustrate how the mechanism detects timing errors of real-time software systems 相似文献

15.

Lazy debugging of lazy functional programs

Robin M. Snyder 《New Generation Computing》1990,8(2):139-161

The debugging of fully lazy functional programs can require searching a very large reduction-history space containing many delayed computations. A debugger should provide a means to obtain a source level representation of the computation, which can be large, and a means to select the appropriate part of the computation to investigate, which can be difficult. A method is presented to compile functional programs to combinator code such that a source-like representation of any part of a computation graph can be efficiently reconstructed at run-time. Other less efficient methods require excessive compile-time guidance as to the specific part of the computation to be investigated. Reconstruction, forward reduction, and a history-rollback mechanism combine to make the entire source-like reduction-history space dynamically available at run-time. The deferring of debugging decisions until run-time is called lazy dubugging. Once the computation-sequence is meaningfully and efficiently available, the problem of debugging becomes that of localizing the search for the error. Some searching issues are discussed with respect to graph browsing and user-interface design. The method shows promise as a programmer tool to debug programs and to informally reason about the time and space behavior of fully lazy functional programs, a nonintuitive process due to the subtleness of sharing and delayed computations. 相似文献

16.

Monitoring and debugging distributed realtime programs

Paul S. Dodd Chinya V. Ravishankar 《Software》1992,22(10):863-877

In this paper we describe the design and implementation of an integrated monitoring and debugging system for a distributed real-time computer system. The monitor provides continuous, transparent monitoring capabilities throughout a real-time system's lifecycle with bounded, minimal, predictable interference by using software support. The monitor is flexible enough to observe both high-level events that are operating system- and application-specific, as well as low-level events such as shared variable references. We present a novel approach to monitoring shared variable references that provides transparent monitoring with low overhead. The monitor is designed to support tasks such as debugging realtime applications, aiding real-time task scheduling, and measuring system performance. Since debugging distributed real-time applications is particularly difficult, we describe how the monitor can be used to debug distributed and parallel applications by deterministic execution replay. 相似文献

17.

Automatic performance debugging of SPMD-style parallel programs

Xu LiuAuthor Vitae Jianfeng ZhanAuthor Vitae Kunlin ZhanAuthor Vitae Dan Meng^{Author Vitae} 《Journal of Parallel and Distributed Computing》2011,71(7):925-937

Automatic performance debugging of parallel applications includes two main steps: locating performance bottlenecks and uncovering their root causes for performance optimization. Previous work fails to resolve this challenging issue in two ways: first, several previous efforts automate locating bottlenecks, but present results in a confined way that only identifies performance problems with a priori knowledge; second, several tools take exploratory or confirmatory data analysis to automatically discover relevant performance data relationships, but these efforts do not focus on locating performance bottlenecks or uncovering their root causes.The simple program and multiple data (SPMD) programming model is widely used for both high performance computing and Cloud computing. In this paper, we design and implement an innovative system, AutoAnalyzer, that automates the process of debugging performance problems of SPMD-style parallel programs, including data collection, performance behavior analysis, locating bottlenecks, and uncovering their root causes. AutoAnalyzer is unique in terms of two features: first, without any prior knowledge, it automatically locates bottlenecks and uncovers their root causes for performance optimization; second, it is lightweight in terms of the size of performance data to be collected and analyzed. Our contributions are three-fold: first, we propose two effective clustering algorithms to investigate the existence of performance bottlenecks that cause process behavior dissimilarity or code region behavior disparity, respectively; meanwhile, we present two searching algorithms to locate bottlenecks; second, on the basis of the rough set theory, we propose an innovative approach to automatically uncover root causes of bottlenecks; third, on the cluster systems with two different configurations, we use two production applications, written in Fortran 77, and one open source code—MPIBZIP2 (http://compression.ca/mpibzip2/), written in C++, to verify the effectiveness and correctness of our methods. For three applications, we also propose an experimental approach to investigating the effects of different metrics on locating bottlenecks. 相似文献

18.

Canonical logic programs

《The Journal of Logic Programming》1986,3(2):143-155

Consider the class of programs P where the greatest fixpoint of T_P is equal to the complement of the finite failure set of P. Programs in this calss possess some important properties which others do not. The main result in this paper proves that this class is representative of all programs. Specifically, we call the programs in this class canonical and we prove that for any program P₁, there exists a semantically equivalent program P₂ which is canonical. 相似文献

19.

Foundations of declarative debugging in arbitrary logic programming

《International journal of man-machine studies》1990,32(2):215-232

相似文献

20.

Conceptual logic programs

Stijn Heymans Davy Van Nieuwenborgh Dirk Vermeir 《Annals of Mathematics and Artificial Intelligence》2006,47(1-2):103-137

相似文献