首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
《国际计算机数学杂志》2012,89(3-4):201-212
This paper is the second of a two-part series exploring the subtle correctness criterion of the absence of livelocks in parallel programs. In this paper we are concerned with the issue of proving this correctness criterion. It is shown that livelocks are not preserved by reduction, implying that reduction cannot be used directly in proving the absence of livelocks. Two applicable proof techniques are also presented. One is based on the notion of establishing sufficient conditions for livelock-freedom; the other is an extension of the well-founded set method for proving termination in sequential programs.  相似文献   

2.
Parallel simulation of parallel programs for large datasets has been shown to offer significant reduction in the execution time of many discrete event models. The paper describes the design and implementation of MPI-SIM, a library for the execution driven parallel simulation of task and data parallel programs. MPI-SIM can be used to predict the performance of existing programs written using MPI for message passing, or written in UC, a data parallel language, compiled to use message passing. The simulation models can be executed sequentially or in parallel. Parallel execution of the models are synchronized using a set of asynchronous conservative protocols. The paper demonstrates how protocol performance is improved by the use of application-level, runtime analysis. The analysis targets the communication patterns of the application. We show the application-level analysis for message passing and data parallel languages. We present the validation and performance results for the simulator for a set of applications that include the NAS Parallel Benchmark suite. The application-level optimization described in the paper yielded significant performance improvements in the simulation of parallel programs, and in some cases completely eliminated the synchronizations in the parallel execution of the simulation model  相似文献   

3.
We investigate models for programming and specifying in which higher-order functions and nondeterminacy (both demonic and angelic) coexist. The models are built using predicate transformers, binary multirelations, state transformers, and free lattices over a poset. We show there exist suitable models in each approach, and that they are isomorphic. The models support an algebra of nondeterminacy which we use to prove that the classical list fusion law holds even in the presence of nondeterminacy.  相似文献   

4.
Summary We prove that recursive assertions are enough for proofs of parallel programs considered in Owicki and Gries [7]. In other words, we prove that for any parallel program S and recursive assertions p and q if {p} S{q} is true under the standard interpretation in natural numbers then all intermediate assertions needed in the proof can be chosen recursive. Finally, we show that if auxiliary variables are used only as program counters then the above result does not hold.  相似文献   

5.
Parallel execution is normally used to decrease the amount of time required to run a program. However, the parallel execution may require far more space than that required by the sequential execution. Worse yet, the parallel space requirement may be very much more difficult to predict than the sequential space requirement because there are more factors to consider. These include essentially nondeterministic factors that can influence scheduling, which in turn may dramatically influence space requirements. We survey some scheduling algorithms that attempt to place bounds on the amount of time and space used during parallel execution. We also outline a direction for future research. This direction takes us into the area of functional programming, where the declarative nature of the languages can help the programmer to produce correct parallel programs, a feat that can be difficult with procedural languages. Currently the high-level nature of functional languages can make it difficult for the programmer to understand the operational behavior of the program. We look at some of the problems in this area, with the goal of achieving a programming environment that supports correct, efficient parallel programs.  相似文献   

6.
Performance visualization uses graphical display techniques to analyze performance data and improve understanding of complex performance phenomena. Current parallel performance visualizations are predominantly two-dimensional. A primary goal of our work is to develop new methods for rapidly prototyping multidimensional performance visualizations. By applying the tools of scientific visualization, we can prototype these next-generation displays for performance visualization-if not implement them as end user tools-using existing software products and graphical techniques that physicists, oceanographers, and meteorologists have used for several years  相似文献   

7.
This article intends to provide some new insights into concurrency using ideas from the theory of dynamical systems. Inherently discrete concurrency corresponds to a parallel continuous concept: a discrete state space corresponds to a differential manifold, an execution path corresponds to a flow line of a dynamical system. To model non-determinacy within dynamical systems, we introduce a new geometrical object, a section cone. A section cone is a convex set in the space of vector fields, all elements having the same singular points. We show that it is enough to consider flow lines of a single vector field in order to capture the behavior of all flow lines in the section cone up to homotopy (corresponding to equivalence of executions).  相似文献   

8.
Relay Ladder Logic (RLL) [5] is a programming language widely used for complex embedded control applications such as manufacturing and amusement park rides. The cost of bugs in RLL programs is extremely high, often measured in millions of dollars (for shutting down a factory) or human safety (for rides). In this paper, we describe our experience in applying constraint-based program analysis techniques to analyze production RLL programs. Our approach is an interesting combination of probabilistic testing and program analysis, and we show that our system is able to detect bugs with high probability, up to the approximations made by the conservative program analysis. We demonstrate that our analysis is useful in detecting some flaws in production RLL programs that are difficult to find by other techniques.  相似文献   

9.
Visualizing the performance of parallel programs   总被引:1,自引:0,他引:1  
ParaGraph, a software tool that provides a detailed, dynamic, graphical animation of the behavior of message-passing parallel programs and graphical summaries of their performance, is presented. ParaGraph animates trace information from actual runs to depict behavior and obtain the performance summaries. It provides twenty-five perspectives on the same data, lending insight that might otherwise be missed. ParaGraph's features are described, its use is explained, its software design is briefly discussed, and its displays are examined in some detail. Future work on ParaGraph is indicated  相似文献   

10.
This paper describes how Crystal, a language based on familiar mathematical notation and lambda calculus, addresses the issues of programmability and performance for parallel supercomputers. Some scientifc programmers and theoreticians may ask, “What is new about Crystal?” or “How is it different from existing functional languages?” The answers lie in its model of parallel computation and a theory of parallel program optimization, and we examine this in the text to follow. We illustrate the power of our approach with benchmarks of compiled parallel code from Crystal source. The target machines are hypercube multiprocessors with distributed memory, on which it is considered difficult for functional programs to achieve high efficiency.  相似文献   

11.
This article describes an execution model for the parallel procedural programming paradigm, which combines multithreading and communications. The model is used to prove sufficient conditions to guarantee the equivalence between two executions of the same program. An efficient mechanism for recording and replaying deterministically parallel procedural programs is derived from the model and implemented in a prototype. Performed on the prototype, systematic measurements of the time overhead of recording traces for replaying various program models indicate that this overhead remains very low.  相似文献   

12.
13.
This paper deals with the construction and use of simple synthetic programs that model the behavior of more complex, real parallel programs. Synthetic programs can be used in many ways: to construct an easily ported suite of benchmark programs, to experiment with alternate parallel implementations of a program without actually writing them, and to predict the behavior and performance of an algorithm on a new or hypothetical machine. Synthetic programs are constructed easily from scratch and from existing programs and can even be constructed using nothing but information obtained from traces of the real program's execution.  相似文献   

14.
Curare, the program restructurer described in this paper automatically transforms a sequential Lisp program into an equivalent concurrent program that runs on a multiprocessor.Data dependences constrain the program's concurrent execution because, in general, two conflicting statements cannot execute in a different order without affecting the program's result. Not all dependences are essential to produce the program's result.Curare attempts to transform the program so it computes its result with fewer conflicts. An optimized program will execute with less synchronization and more concurrency. Curare then examines loops in a program to find those that are unconstrained or lightly constrained by dependences. By necessity,Curare treats recursive functions as loops and does not limit itself to explicit program loops. Recursive functions offer several advantages over explicit loops since they provide a convenient framework for inserting locks and handling the dynamic behavior of symbolic programs. Loops that are suitable for concurrent execution are changed to execute on a set of concurrent server processes. These servers execute single loop iterations and therefore need to be extremely inexpensive to invoke.Restructured programs execute significantly faster than the original sequential programs. This improvement is large enough to attract programmers to a multiprocessor, particularly since it requires little effort on their part.This research was funded by DARPA contract numbers N00039-85-C-0269 (SPUR) and N00039-84-C-0089 (XCS) and by an NSF Presidential Young Investigator award to Paul N. Hilfinger. Additional funding came from the California MICRO program (in conjunction with Texas Instruments, Xerox, Honeywell, and Phillips/Signetics).  相似文献   

15.
The peculiarities of transforming functional dataflow parallel programs into programs with finite resources are analysed. It is considered how these transformations are affected by the usage of asynchronous lists, the return of delayed lists and the variation of the data arrival pace relative to the time of its processing. These transformations allow us to generate multiple programs with static parallelism based on one and the same functional dataflow parallel program.  相似文献   

16.
R. Rubinfeld 《Algorithmica》1996,15(4):287-301
Program correctness for parallel programs is an even more problematic issue than for serial programs. We extend the theory of program result checking to parallel programs, and find general techniques for designing such result checkers that work for many basic problems in parallel computation. These result checkers are simple to program and are more efficient than the actual computation of the result. For example, sorting, multiplication, parity, the all-pairs shortest-path problem and majority all have constant depth result checkers, and the result checkers for all but the last problem use a linear number of processors. We show that there are P-complete problems (evaluating straight-line programs, linear programming) that have very fast, even constant depth, result checkers.This research was done while at the Computer Science Division, University of California, Berkeley, and the International Computer Science Institute, Berkeley, California. Supported in part by an IBM Graduate Fellowship and NSF Grant No. CCR 88-13632.  相似文献   

17.
Estimating and optimizing performance for parallel programs   总被引:1,自引:0,他引:1  
Fahringer  T. 《Computer》1995,28(11):47-56
The article describes P3T, a parameter-based performance prediction tool that estimates performance for parallel programs running on distributed-memory parallel architectures. P3 T has been carefully designed to address all of the above performance estimation issues. To achieve high estimation accuracy, P 3T aggressively exploits compiler analysis and optimization information. Our method is based on modeling loop iteration spaces, array access patterns, and data distributions by intersection and volume operations on n-dimensional polytopes. The most critical architecture-specific factors, such as cache line sizes, number of cache lines available, routing policy, start-up times, message transfer time per byte, and so forth, are modeled to reflect the performance impact of the target machine. P3T has been developed in the context of the Vienna Fortran Compilation Systems (VFCS), a state-of-the-art parallelization tool for distributed-memory systems. VFCS translates Fortran programs into explicitly parallel message-passing programs. P 3T successfully guides the interactive and automatic restructuring of programs under this system. The article describes the underlying compilation and programming model and discusses the most critical design decisions made for P3T; in addition, it outlines the implementation of the parallel program parameters. Also described are the VFCS context under which P3T is applied and the P3T graphical user interface  相似文献   

18.
19.
We model a deterministic parallel program by a directed acyclic graph of tasks, where a task can execute as soon as all tasks preceding it have been executed. Each task can allocate or release an arbitrary amount of memory (i.e., heap memory allocation can be modeled). We call a parallel schedule “space efficient” if the amount of memory required is at most equal to the number of processors times the amount of memory required for some depth-first execution of the program by a single processor. We describe a simple, locally depth-first scheduling algorithm and show that it is always space efficient. Since the scheduling algorithm is greedy, it will be within a factor of two of being optimal with respect to time. For the special case of a program having a series-parallel structure, we show how to efficiently compute the worst case memory requirements over all possible depth-first executions of a program. Finally, we show how scheduling can be decentralized, making the approach scalable to a large number of processors when there is sufficient parallelism  相似文献   

20.
Programs whose parallelism stems from multiple recursion form an interesting subclass of parallel programs with many practical applications. The highly irregular shape of many recursion trees makes it difficult to obtain good load balancing with small overhead. We present a system, called REAPAR, that executes recursive C programs in parallel on SMP machines. Based on data from a single profiling run of the program, REAPAR selects a load-balancing strategy that is both effective and efficient and it generates parallel code implementing that strategy. The performance obtained by REAPAR on a diverse set of benchmarks matches that published for much more complex systems requiring high-level problem-oriented explicitly parallel constructs. A case study even found REAPAR to be competitive to handwritten (low-level, machine-oriented) thread-parallel code  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号