首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 562 毫秒
1.
This paper describes theoretical and practical aspects of a partial evaluator that treats a parallel lambda language.The parallel language presented is a combination of lambda calculus and message passing communication mechanism.This parallel language can be used to write a programming language‘s denotational semantics which extracts the paallelism in the program.From this denotational definition of the programming language,the partial evaluator can generate parallel compiler of the language by self-application. The key technique of partial evaluation is binding time analysis that determines in advance which parts of the source program can be evaluated during partial evaluation,and which parts cannot,A binding time analysis is described based upon type inference.A new type chcode in introduced into the type system,which denotes the type of those expressions containing residual channel operations.A well-formedness criterion is given which ensures that partial evaluation not only doesn‘t commit type errors but also doesn‘t change the sequence of channel operations.Before binding time analysis,channel analysis is used to analyze the communication relationship between send and receive processes.  相似文献   

2.
Jones optimality implies that a program specializer is strong enough to remove an entire level of self-interpretation. This paper argues that Jones optimality, which was originally devised as a criterion for self-applicable specializers, plays a fundamental role in the use of a binding-time improvement prepass prior to specialization. We establish that, regardless of the binding-time improvements applied to a subject program (no matter how extensively), a specializer that is not Jones-optimal is strictly weaker than a specializer that is Jones-optimal. We describe the main approaches that increase the strength of a specializer without requiring its modification, namely incremental specialization and the interpretive approach, and show that they are equally powerful when the specializer is bti-universal. Since this includes the generation of program specializers from interpreters, the theoretical possibility of bootstrapping powerful specializers is established.  相似文献   

3.
王明文  孙永强 《软件学报》2001,12(8):1154-1161
讨论了一个对象式Lambda演算的部分计值器.对象式Lambda演算在Lambda演算的基础上添加了对象机制.部分计值器的构造是采用传统的三步法,首先定义对象式Lambda演算的元解释器;然后提出对象式Lambda演算的约束时间分析方法(binding-timeanalysis),约束时间分析决定哪些计算可以在编译时完成,哪些计算需留在运行时执行;最后定义部分计值器,同时,给出了元解释器和部分计值器的正确性证明.  相似文献   

4.
本文描述了流图语言的自应用型静态部分求值器,它由活跃变量分析、抽象分析、标记和例化4部分组成.在活跃变量分析基础上再作抽象分析,比以往的抽象分析获得的抽象解释更精确,也更利于产生较高质量的剩余程序.转移压缩在例化中直接进行.  相似文献   

5.
The program transformation principle called partial evaluation has interesting applications in compilation and compiler generation. Self-applicable partial evaluators may be used for transforming interpreters into corresponding compilers and even for the generation of compiler generators. This is useful because interpreters are significantly easier to write than compilers, but run much slower than compiled code. A major difficulty in writing compilers (and compiler generators) is the thinking in terms of distinct binding times: run time and compile time (and compiler generation time). The paper gives an introduction to partial evaluation and describes a fully automatic though experimental partial evaluator, called mix, able to generate stand-alone compilers as well as a compiler generator. Mix partially evaluates programs written in Mixwell, essentially a first-order subset of statically scoped pure Lisp. For compiler generation purposes it is necessary that the partial evaluator be self-applicable. Even though the potential utility of a self-applicable partial evaluator has been recognized since 1971, a 1984 version of mix appears to be the first successful implementation. The overall structure of mix and the basic ideas behind its way of working are sketched. Finally, some results of using a version of mix are reported.  相似文献   

6.
Cluster环境下p—HPF编译器支持的并行计算范式   总被引:2,自引:0,他引:2  
p-HPF是研制的一个符合HPF(high performance Fortran)规范的并行编译系统,以HPF为核心实现多范式并行计算是开发大型并行应用系统的基础。首先论述了Cluster环境下的并行运行范式,包括farm parallel范式、流水线并行、流循环并行、基于数据并行和组合数据并行等,抽象分析了它们的性能,接着给出了利用p-HPF的外部过程机制、任务并行机制以以FORALL,INDEPENDENT DO等典型并行语句实现几种典型并行范式的方法,给出了实例程序,对实例进行了实际运行并对运行结果进行了分析。  相似文献   

7.
Programming multiprocessor parallel architectures is a complex task. This paper describes a block-structured scientific programming language, BLAZE, designed to simplify this task. BLAZE contains array arithmetic, ‘forall’ loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus BLAZE should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with conceptually sequential control flow.

A central goal in the design of BLAZE is portability across a broad range of parallel architectures. The multiple levels of parallelism present in BLAZE code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture, while neglecting the remainder. This paper describes the features of BLAZE, and show how this language would be used in typical scientific programming.  相似文献   


8.
张丽岩  马健  孙焰 《微型机与应用》2011,30(17):67-70,73
提出了一个新的基于线程构建模块(TBB)的三层并行遗传算法(TPGA)。与传统遗传算法相比,在保证了算法正确性的前提下提高了运行效率,并将遗传算法的数据编码、任务处理和数据解码分别进行并行化,提高了收敛速度。TBB是Intel提供的能够完整表现并行性的代码库。采用C++语言实现了基于TBB的TPGA和串行遗传算法(SGA),通过大量实验证明,TPGA同SGA相比,不但提高了收敛速度,而且能够取得一致的最优解。  相似文献   

9.
Multicore computers are ubiquitous. Expert developers as well as developers with little experience in parallelism are now asked to create multithreaded software to exploit parallelism in mainstream shared‐memory hardware. However, finding and fixing parallel programming errors is a complex and arduous task. Programmers thus rely on tools such as race detectors that typically focus on reporting errors due to incorrect usage of synchronization constructs or due to missing synchronization. This arsenal of debugging techniques, however, is incomplete. This article presents a new perspective and addresses a largely unexplored direction of defect localization where a wrong usage of nonparallel programming constructs might cause wrong parallel application behavior. In particular, we make a contribution by showing how to use data‐mining techniques to locate defects in multithreaded shared‐memory programs. Our technique analyzes execution anomalies in a condensed representation of the dynamic call graphs of a multithreaded object‐oriented application and identifies methods that contain a defect. Compared with race detectors that concentrate on finding incorrect synchronization, our method is able to reveal a wider range of defects that affect the control flow of a parallel program. Results from controlled experiments show that our data‐mining approach finds not only race conditions in different types of multicore applications but also other errors that cause incorrect parallel program behavior. Data‐mining techniques offer a fruitful new ground for parallel program debugging, and we also discuss long‐term directions for this interesting field. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

10.
LISP语言的增量式部分求值器   总被引:2,自引:0,他引:2  
李航  宋立彤  金成植 《软件学报》1996,7(8):492-498
部分求值在软件优化中有着重要的作用,而增量式计算是避免重复计算的一种技术.本文基于部分求值和增量式计算的技术,实现了一个LISP语言的增量武部分求值器,使得函数例化尽量在前次已经产生的剩余程序的基础上进行,从而很好地提高了效率.  相似文献   

11.
Compiling scientific code using partial evaluation   总被引:1,自引:0,他引:1  
Berlin  A. Weise  D. 《Computer》1990,23(12):25-37
The partial evaluation approach, which transforms a high-level program into a low-level program that is specialized for a particular application, exposing the parallelism inherent in the underlying numerical computation, is discussed. A prototype compiler that uses partial evaluation is described. Experiments with the compiler have shown that for an important class of numerical programs, partial evaluation can provide marked performance improvements: speedups over conventionally compiled code that range from seven times faster to 91 times faster have been measured. By coupling partial evaluation with parallel scheduling techniques, the low-level parallelism inherent in a computation can be exploited on heavily pipelined or parallel architectures. The approach has been demonstrated by applying a parallel scheduler to a partially evaluated program that simulates the motion of a nine-body solar system  相似文献   

12.
This paper presents an efficient algorithm that automatically generates a parallel program from a dependence-based representation of a sequential program. The resulting parallel program consists of nested fork-join constructs, composed from the loops and statements of the sequential program. Data dependences are handled by two techniques. One technique implicitly satisfies them by sequencing, thereby reducing parallelism. Where increased parallelism results, the other technique eliminates them by privatization: the introduction of process-specific private instances of variables. Additionally, the algorithm determines when copying values of such instances in and out of nested parallel constructs results in greater parallelism. This is the first algorithm for automatically generating parallelism for such a general model. The algorithm generates as much parallelism as is possible in our model while minimizing privatization.An earlier version of this paper was presented at the First Workshop on Languages and Compilers for Vector and Parallel Machines, which was held at Cornell University in August 1988. That same year a select group of these workshop papers were published in two special issues of the journal: volume 2, numbers 2 and 3.  相似文献   

13.
论文致力于对图像处理算法的串行C程序进行子字并行分析,并重定向到带有多媒体扩展的通用处理器和多媒体专用嵌入式微处理器。图像处理算法的特点决定其是内在可并行的,这种并行粒度介于数据并行(DLP)和指令级并行(ILP)之间,称之为子字并行。但是,当前的编译技术很难充分挖掘和定位程序基本块内的子字并行,对此设计了一种基于流图程序表示的编译方法,能够从串行程序中显式地定位子字并行。扩展了编译器的功能,增加了特定的模式库,基于模式识别的控制流和数据流分析后,产生特定的子字并行流图(SWFG,Sub-WordFlowGraph),并将该图作为中间表示,提供给子字并行指令选择,进而实现有效的子字并行代码产生。  相似文献   

14.
This paper presents a new compiler optimization algorithm that parallelizes applications for symmetric, shared-memory multiprocessors. The algorithm considers data locality, parallelism, and the granularity of parallelism. It uses dependence analysis and a simple cache model to drive its optimizations. It also optimizes across procedures by using interprocedural analysis and transformations. We validate the algorithm by hand-applying it to sequential versions of parallel, Fortran programs operating over dense matrices. The programs initially were hand-coded to target a variety of parallel machines using loop parallelism. We ignore the user's parallel loop directives, and use known and implemented dependence and interprocedural analysis to find parallelism. We then apply our new optimization algorithm to the resulting program. We compare the original parallel program to the hand-optimized program, and show that our algorithm improves three programs, matches four programs, and degrades one program in our test suite on a shared-memory, bus-based parallel machine with local caches. This experiment suggests existing dependence and interprocedural array analysis can automatically detect user parallelism, and demonstrates that user parallelized codes often benefit from our compiler optimizations, providing evidence that we need both parallel algorithms and compiler optimizations to effectively utilize parallel machines  相似文献   

15.
We present an effective approach to performing data flow analysis in parallel and identify three types of parallelism inherent in this solution process: independent-problem parallelism, separate-unit parallelism and algorithmic parallelism. We present our investigations of Fortran procedures from thePerfect Benchmarks andnetlib libraries, which reveal structural characteristics of program flow graphs that are amenable to algorithmic parallelism. Previously, the utility of algorithmic parallelism had been explored using our parallel hybrid algorithm in the context of solving the Reaching Definitions problem for Fortran procedures. Here we present new refinements that optimize performance by increasing the grain size of the parallelism, to improve communication on distributed-memory machines. The empirical performance of our optimized and unoptimized hybrid algorithms for Reaching Definitions are compared on this large data set using an iPSC/2. Our empirical findings qualitatively validate the usefulness of algorithmic parallelism.This research was supported, in part, by National Science Foundation grants CCR-8920078 and CCR-9023628-1, 2/5. An earlier version of this paper appears inProceedings of the 6th ACM International Conference on Supecomputing (Washington, D.C., July 1992), pp. 236–247.  相似文献   

16.
Partial evaluation is a program transformation that automatically specializes a program with respect to invariants. Despite successful application in areas such as graphics, operating systems, and software engineering, partial evaluators have yet to achieve widespread use. One reason is the difficulty of adequately describing specialization opportunities. Indeed, underspecialization or overspecialization often occurs, without any feedback as to the source of the problem. We have developed a high-level, module-based language allowing the program developer to guide the choice of both the code to specialize and the invariants to exploit during the specialization process. To ease the use of partial evaluation, the syntax of this language is similar to the declaration syntax of the target language of the partial evaluator. To provide feedback, declarations are checked during the analyses performed by partial evaluation. The language has been successfully used by a variety of users, including students having no previous experience with partial evaluation.  相似文献   

17.
Partial evaluation is a symbolic manipulation technique used to produce efficient algorithms when part of the input to the algorithm is known. Other applications of partial evaluators such as universal compilation and compiler generation are also known to be possible. A partial evaluator receives as input a program and partially known input to that program, and outputs a residual program which should run at least as efficient as the input program with restricted input. In this paper we study the case where both the input and residual programs are logic programs, being the partial evaluator itself a logic program. Up to now, partial evaluators have failed to process large “non=toy” examples. Here we present extensions to partial evaluators whic will allow us to produce more efficient residual programs using less computing resources, during partial evaluation. First, the introduced extensions allow the processing of large examples, which is not possible with the previous techniques. This is now possible since the extensions use less CPU time and memory consumption during the partial evaluation process. Second, the extended partial evaluator produces smaller residual programs, producing important CPU time optimizing effects. With the standard techniques, a partial evaluator will most probably act as a pessimizer, not as an optimizer. Examples are given.  相似文献   

18.
ParC is an extension of the C programming language with block-oriented parallel constructs that allow the programmer to express fine-grain parallelism in a shared-memory model. It is suitable for the expression of parallel shared-memory algorithms, and also conducive for the parallelization of sequential C programs. In addition, performance enhancing transformations can be applied within the language, without resorting to low-level programming. The language includes closed constructs to create parallelism, as well as instructions to cause the termination of parallel activities and to enforce synchronization. The parallel constructs are used to define the scope of shared variables, and also to delimit the sets of activities that are influenced by termination or synchronization instructions. The semantics of parallelism are discussed, especially relating to the discrepancy between the limited number of physical processors and the potentially much larger number of parallel activities in a program.  相似文献   

19.
A new notion of input/output equivalence of distributed imperative programs, with synchronous communications, is introduced. It preserves the input/output relation, encompassing both, initial/final state and communication channel values. For its mathematical justification, the semantic framework of Manna and Pnueli, based on finite transition systems and reduced behaviors, is extended with the notion of input/output behavior. A set of laws for the equivalence is overviewed. A deduction rule for the substitution of references to input/output equivalent procedures is defined and justified in the new semantics. The rule is applied to decompose distributed program simplification proofs, introduced in a prior work, which use the laws to establish the equivalence between a sequential and a parallel communicating program. They include communication elimination as one of their steps. An outline of one of such proofs, for a pipelined processor model, is included.  相似文献   

20.
In this paper, we describe lazy threads, a new approach for implementing multithreaded execution models on conventional machines. We show how they can implement a parallel call at nearly the efficiency of a sequential call. The central idea is to specialize the representation of a parallel call so that it can execute as a parallel-ready sequential call. This allows excess parallelism to degrade into sequential calls with the attendant efficient stack management and direct transfer of control and data, yet a call that truly needs to execute in parallel, gets its own thread of control. The efficiency of lazy threads is achieved through a careful attention to storage management and a code generation strategy that allows us to represent potential parallel work with no overhead.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号