期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Interprocedural partial redundancy elimination with application todistributed memory compilation

Agrawal G. 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(7):609-625

Partial Redundancy Elimination (PRE) is a general scheme for suppressing partial redundancies which encompasses traditional optimizations like loop invariant code motion and redundant code elimination. In this paper, we address the problem of performing this optimization interprocedurally. We present an Interprocedural Partial Redundancy Elimination (IPRE) scheme based upon a new, concise, full program representation. Our method is applicable to arbitrary recursive programs. We use interprocedural partial redundancy elimination for placement of communication and communication preprocessing statements while compiling for distributed memory parallel machines. We have implemented our scheme as an extension to the Fortran D compilation system. We present experimental results from two codes compiled using our system to demonstrate the useful of IPRE in distributed memory compilation 相似文献

2.

Specifying and Automatically Generating a Specialization Tool for Fortran 90

Sandrine Blazy 《Automated Software Engineering》2000,7(4):345-376

Partial evaluation is an optimization technique traditionally used in compilation. We have adapted this technique to the understanding of scientific application programs during their maintenance. We have implemented a tool that analyzes Fortran 90 application programs and performs an interprocedural pointer analysis. This paper presents a dynamic semantics of Fortran 90 and manually derives a partial evaluator from this semantics. The tool implementing the specifications is also detailed. The partial evaluator has been implemented in a generic programming environment and a graphical interface has been developed to visualize the information computed during the partial evaluation (values of variables, already analyzed procedures, scope of variables, removed statements, etc.). 相似文献

3.

一个面向C和Fortran数值程序的静态分析工具

侯苏宁陈立前王昭飞王戟《计算机工程与科学》2011,33(3):94

程序的正确性验证一直以来都是计算机科学中的一个挑战性问题,抽象解释理论为程序静态分析提供了一个通用框架,可以在编译时自动地推导程序的动态性质。基于抽象解释的数值程序分析可以自动推导程序中数值变量间的不变式关系,这对于编译优化、程序错误检查至关重要。本文建立并实现了一个面向C和Fortran程序并支持过程间分析的数值程序分析框架和工具,C或Fortran源程序经过预处理后转化为具有统一格式的中间表示形式,然后基于该中间表示抽取与源程序语义等价的语义等式,最后在该语义等式上进行不动点迭代计算从而得到程序不变式。在此基础上,本文还对数组等复杂语法结构进行了建模和抽象。实验结果表明,该工具具有较高的可扩展性、精度,并能够处理大部分因数组的使用而带来的程序分析上的问题。相似文献

4.

Performance of the Prototype Fortran D Compiler

CHAU-WEN TSENG 《Software》1997,27(7):763-796

Fortran D is a version of Fortran enhanced with data decomposition specifications. Case studies illustrate strengths and weaknesses of the prototype Fortran D compiler when compiling linear algebra codes and whole programs. Statement groups, execution conditions, inter-loop communication optimizations, multi-reductions, and array kills for replicated arrays are identified as new compilation issues. On the Intel iPSC/860, the output of the prototype Fortran D compiler approaches the performance of hand-optimized code for parallel computations, but needs improvement for linear algebra and pipelined codes. The Fortran D compiler outperforms the CM Fortran compiler (2.1 beta) by a factor of four or more on the TMC CM-5 when not using vector units. Its performance is comparable to the DEC and IBM HPF compilers on a Alpha cluster and SP-2. Better analysis, run-time support, and flexibility are required for the prototype compiler to be useful for a wider range of programs. © 1997 John Wiley & Sons, Ltd. 相似文献

5.

Interprocedural Data Flow Based Optimizations for Distributed Memory Compilation

GAGAN AGRAWAL JOEL SALTZ 《Software》1997,27(5):519-545

Data parallel languages like High Performance Fortran (HPF) are emerging as the architecture independent mode of programming distributed memory parallel machines. In this paper, we present the interprocedural optimizations required for compiling applications having irregular data access patterns, when coded in such data parallel languages. We have developed an Interprocedural Partial Redundancy Elimination (IPRE) algorithm for optimized placement of runtime preprocessing routine and collective communication routines inserted for managing communication in such codes. We also present two new interprocedural optimizations: placement of scatter routines and use of coalescing and incremental routines. We then describe how program slicing can be used for further applying IPRE in more complex scenarios. We have done a preliminary implementation of the schemes presented here using the Fortran D compilation system as the necessary infrastructure. We present experimental results from two codes compiled usng our system to demonstrate the efficacy of the presented schemes. ©1997 John Wiley & Sons, Ltd. 相似文献

6.

一种精确的调用图生成技术

唐新春郭春榕《计算机工程与设计》1997,18(5):34-39

调用图是过程间分析和程度自动并行化的基础。生成精确调用图可以进一步开发程序的并行性。此文针对Ｆｏｒｔｒａｎ程序，提出了一项完全消除哑过程，产生精确调用图的技术与相应的算法。该算法已在面向ＭＰＰＦｏｒｔｒａｎ的程序自动并行化工具中实现。相似文献

7.

Selecting and using data for integration testing

Harrold M.J. Soffa M.L. 《Software, IEEE》1991,8(2):58-65

The extension of dataflow testing to interprocedural testing is described. This was done by developing both an analysis technique that computes the required interprocedural definition-use information, for both direct and indirect dependencies and a testing technique that uses this information in selecting and executing the subpaths across procedure boundaries. A testing tool that implements this technique is presented. For the interprocedural dataflow analysis, the technique summarizes the individual procedures' definition and use information at call sites and then propagates this information throughout the interacting procedures. By efficiently computing the interprocedural data dependencies before testing, the approach lets the testing tool use existing path-selection techniques based on dataflow for interprocedural testing. To track the execution path, the technique recognizes the calls to and returns from procedures and handles the association of various names with a definition as the execution path is being inspected. The technique handles recursive procedures and supports separate compilation of procedures 相似文献

8.

A compiler optimization algorithm for shared-memory multiprocessors

McKinley K.S. 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(8):769-787

This paper presents a new compiler optimization algorithm that parallelizes applications for symmetric, shared-memory multiprocessors. The algorithm considers data locality, parallelism, and the granularity of parallelism. It uses dependence analysis and a simple cache model to drive its optimizations. It also optimizes across procedures by using interprocedural analysis and transformations. We validate the algorithm by hand-applying it to sequential versions of parallel, Fortran programs operating over dense matrices. The programs initially were hand-coded to target a variety of parallel machines using loop parallelism. We ignore the user's parallel loop directives, and use known and implemented dependence and interprocedural analysis to find parallelism. We then apply our new optimization algorithm to the resulting program. We compare the original parallel program to the hand-optimized program, and show that our algorithm improves three programs, matches four programs, and degrades one program in our test suite on a shared-memory, bus-based parallel machine with local caches. This experiment suggests existing dependence and interprocedural array analysis can automatically detect user parallelism, and demonstrates that user parallelized codes often benefit from our compiler optimizations, providing evidence that we need both parallel algorithms and compiler optimizations to effectively utilize parallel machines 相似文献

9.

Code optimization across procedures

Richardson S. Ganapathi M. 《Computer》1989,22(2):42-50

Procedure calls can be a major obstacle to the analysis of computer programs, preventing significant improvements in program speed. A broad range of techniques, each of which is in some sense interprocedural by nature, is considered to overcome this obstacle. Some techniques rely on interprocedural dataflow in their analysis. Others require interprocedural information in the form of detailed profile data or information concerning the scope of a given procedure in relation to other procedures. These include procedure integration, interprocedural register allocation, pointer and alias tracking, and dependency analysis 相似文献

10.

Interprocedural alias analysis: Implementation and empirical results

Herbert G. Mayer Michael Wolfe 《Software》1993,23(11):1201-1233

相似文献

11.

OpenMP指导语句全局嵌套类型的静态分析及应用

下载免费PDF全文

陈永健舒继武李建江王鼎兴《软件学报》2005,16(2):194-204

由于指导语句动态嵌套与绑定规则的存在,OpenMP程序中线程的一些上下文只能在运行时刻才能完全确定.然而,通过编译时刻的静态分析可以部分确定指导语句的嵌套类型,这些信息可以用于指导后续的编译与优化.由于函数调用的存在,嵌套与绑定常常会跨越过程边界,除了通常的局部和全局分析之外,还需要过程间分析的支持.通过在通常的过程间分析的基础上附加信息,可以使得嵌套类型信息在过程调用图中进行传播.将这些全局信息与过程内的局部信息结合起来,就可以在编译时刻确定语句的嵌套类型.结果表明,编译时刻的嵌套类型分析可以有效地确定通常的科学与工程计算程序中指导语句的嵌套类型,基于嵌套类型的翻译与优化可以同时减少运行时开销和目标代码长度. 相似文献

12.

Compilation of set terms in the logic data language (LDL)

《The Journal of Logic Programming》1992,12(1-2):89-119

We propose compilation methods for the efficient support of set-term matching in Horn-clause programs. Rather than using general-ourpose set-matching algorithms, we take the approach of formulating at compile time specialized computation plans that, by taking advantage of information available in the given rules, limit the number of alternatives explored. Our strategy relies on rewriting techniques to transform the problem into an “ordinary” Horn-clause compilation problem, with minimal additional overhead. The execution cost of the rewritten rules is substantially lower than that of the original rules, and the additional cost of compilation can thus be amortized over many executions. 相似文献

13.

Support of automatic parallelization with concept comprehension

《Journal of Systems Architecture》1999,45(6-7):427-439

Current approaches to parallelizing compilation perform a purely structural analysis of the sequential code. Conversely, a semantic analysis performing concept assignment for code sections, can support the recognition of the algorithms that the code implements. This can considerably help the parallelization process, by allowing the introduction of heuristics and an extensive pruning of the search space, and thus enabling the application of more aggressive code transformations. It can play an important role in overcoming the current limitations to Automatic Parallelization. In this paper we discuss the applicability of concept comprehension to the parallelization process, and we present a novel technique for automatic algorithmic recognition we have designed and implemented. We are currently developing a reverse engineering tool supporting the translation of sequential Fortran code into HPF, which is based on the recognition technique we have developed. Its working criteria are illustrated and discussed. 相似文献

14.

Parallelization support for coupled grid applications with small meshes

Lorie M. Liebrock Ken Kennedy 《Concurrency and Computation》1996,8(8):581-615

Composite grid problems arise in important application areas, e.g. reactor simulation. Related physical phenomena are inherently parallel and their simulations are computationally intensive. Unfortunately, parallel languages, such as High Performance Fortran, provide little support for these problems. We illustrate topological connections via a coupling statement, develop a programming style and transformation system to support composite grid code development, and develop an algorithm that automatically determines distributions for composite grid problems with small meshes. A mesh is classified as small if the amount of computational work associated with the mesh is less than the amount of work to be assigned to a single processor. Precompiler transformations, such as cloning for alignment specification, are described. Excerpts from a High Performance Fortran program before and after transformation illustrate user programming style and transformation issues. Our distribution algorithm's alignment and distribution specifications are input to the transformed High Performance Fortran programs which applies the mapping for execution of the simulation code. Some advantages of this approach are: transformations are applied before compilation and allow communication optimization; data distribution may be determined for any number of problems without recompilation; user determined distribution for parallelization is unnecessary; portability is improved. We validate the topology-based data distribution algorithm using a number of reactor configurations. Two random distribution algorithms provide a basis of comparison with measures of load balance and communication cost. Experiments show that the topology-based distribution algorithm almost always obtains load balance at least as good as, and often significantly better than, random algorithms while reducing the total communication per iteration from 50% to as much as a factor of ten. 相似文献

15.

Analysis of interprocedural side effects in a parallel programming environment

《Journal of Parallel and Distributed Computing》1988,5(5):517-550

This paper addresses the analysis of subroutine side effects in the ParaScope programming environment, an ambitious collection of tools for developing, understanding, and compiling parallel programs. In spite of significant progress in the optimization of programs for execution on parallel and vector computers, compilers must still be very conservative when optimizing the code surrounding a call site, due to the lack of information about the code in the subroutine being invoked. This has resulted in the development of algorithms for interprocedural analysis of the side effects of a subroutine, which summarize the body of a subroutine, producing approximate information to improve optimization. This paper reviews the effectiveness of these methods in preparing programs for execution on parallel computers. It is shown that existing techniques are insufficient and a new technique, called regular section analysis, is described. Regular section analysis extends the lattice used in previous interprocedural analysis methods to one that is rich enough to represent common array access patterns: elements, rows, columns, and their higher-dimensional analogs. Regular sections are defined, their properties are established, and the modifications to existing interprocedural analysis algorithms required to handle regular sections are presented. Among these modifications are methods for dealing with language features that reshape array parameters at call sites. In addition to improved precision of summary information, we also examine two problems crucial to effective parallelization. The first addresses the need for information about which variables are always redefined as a side effect of a call and the second addresses the requirement that, for parallel programming, information about side effects must be qualified by information about any critical regions in which those side effects take place. These problems are solved by extensions to existing interprocedural dataflow analysis frameworks. 相似文献

16.

串行程序并行化中计算代码与同步通信代码的自动生成

龚雪容生拥宏沈亚楠《计算机应用与软件》2008,25(1):91-92,104

简要介绍了并行编译中的计算划分和依赖关系分析,提出如何利用计算划分和依赖关系自动生成并行程序中的计算代码和同步通信代码. 相似文献

17.

Precise slicing of concurrent programs

Dennis Giffhorn Christian Hammer 《Automated Software Engineering》2009,16(2):197-234

While there exist efficient algorithms to slice sequential programs precisely, there are only two algorithms for precise slicing of concurrent interprocedural programs with recursive procedures (Krinke in Proc. ESEC/FSE’03, pp. 178–187, 2003; Nanda and Ramesh in ACM Toplas. 28(6):1088–1144, 2006). We present an empirical evaluation of both algorithms for Java. We demonstrate that both algorithms provide the same precision up to the model of concurrency in use and show that the concurrency model has strong impact on slice precision and computation costs. Furthermore, we extend both algorithms to support dynamic thread creation both in loops and recursion—a feature that the original algorithms could not fully handle. The worst case complexity of the algorithms being exponential, we developed several optimizations and compared these with each other and with algorithms that trade precision for speed. Finally, we show that one algorithm may produce incorrect slices and present a remedy. 相似文献

18.

Studying energy trade offs in offloading computation/compilation in Java-enabled mobile devices 总被引：1，自引：0，他引：1

Chen G. Kang B.-T. Kandemir M. Vijaykrishnan N. Irwin M.J. Chandramouli R. 《Parallel and Distributed Systems, IEEE Transactions on》2004,15(9):795-809

Java-enabled wireless devices are preferred for various reasons. For example, users can dynamically download Java applications on demand. The dynamic download capability supports extensibility of the mobile client features and centralizes application maintenance at the server. Also, it enables service providers to customize features for the clients. In this work, we extend this client-server collaboration further by offloading some of the computations (i.e., method execution and dynamic compilation) normally performed by the mobile client to the resource-rich server in order to conserve energy consumed by the client in a wireless Java environment. In the proposed framework, the object serialization feature of Java is used to allow offloading of both method execution and bytecode-to-native code compilation to the server when executing a Java application. Our framework takes into account communication, computation, and compilation energies to decide where to compile and execute a method (locally or remotely), and how to execute it (using interpretation or just-in-time compilation with different levels of optimizations). As both computation and communication energies vary based on external conditions (such as the wireless channel state and user supplied inputs), our decision must be done dynamically when a method is invoked. Our experiments, using a set of Java applications executed on a simulation framework, reveal that the proposed techniques are very effective in conserving the energy of the mobile client. 相似文献

19.

The complementary relationship of interprocedural register allocation and inlining

Feipei Lai Yung-kuang Chao Chia-Jung Hsieh 《International journal of parallel programming》1994,22(4):409-434

Inline expansion and interprocedural register allocation are two general approaches used for interprocedural optimization. However, there are certain situations which prevent either of them from being applied smoothly to procedure calls. Especially, interactions between inlining and register allocation can cause an inlined version of a program to run more slowly than its noninlined counterpart. This paper describes a method of integrating inlining and interprocedural register allocation to reduce the procedure call overhead without this negative effect. We use profile information to identify the heavy called procedures regions and the register usage information of each code site to optimize the placement of the register save/restore code. This method also takes full advantage offree-use registers at each procedure call site. The average performance improvement is 1.21 compared with the previous schemes that performed either of them independently. 相似文献

20.

Simplifying Control Flow in Compiler-Generated Parallel Code

John Mellor-Crummey Vikram Adve 《International journal of parallel programming》1998,26(5):613-638

Optimizing compilers for data-parallel languages such as High Performance Fortran perform a complex sequence of transformations. However, the effects of many transformations are not independent, which makes it challenging to generate high quality code. In particular, some transformations introduce conditional control flow, while others make some conditionals unnecessary by refining program context. Eliminating unnecessary conditional control flow during compilation can reduce code size and remove a source of overhead in the generated code. This paper describes algorithms to compute symbolic constraints on the values of expressions used in control predicates and to use these constraints to identify and remove unnecessary conditional control flow. These algorithms have been implemented in the Rice dHPF compiler and we show that these algorithms are effective in reducing the number of conditionals and the overall size of generated code. Finally, we describe a synergy between control flow simplification and data-parallel code generation based on loop splitting which achieves the effects of more narrow data-parallel compiler optimizations such as vector message pipelining and the use of overlap areas. 相似文献