首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
An automatic vectorizing compiler called V-Pascal is described in detail. The compiler has been designed and implemented with a view to vectorizing Pascal source programs. Using the mechanism of vector indirect addressing, it reduces multiply nestedfor loops to equivalent single loops, which are then executed by vector mode with sufficiently long vector lengths. TheD matrix, which is an adjacency matrix giving dependences between intermediate code nodes, plays an important role in the V-Pascal compiler. It is demonstrated that, in some cases, the V-Pascal compiler yields object code that runs faster than the Fortran counterpart. This paper mainly presents the basic constituents of the Version 1 of the V-Pascal compiler. Version 2 includes higher functions such as vectorization ofwhile-do loops and recursive procedures, vectorization of character string manipulations and relational database operations (written in Pascal), and automatic parallel decomposition for multiprocessor environments.  相似文献   

2.
A Vectorizing Compiler for Multimedia Extensions   总被引:6,自引:0,他引:6  
In this paper, we present an implementation of a vectorizing C compiler for Intel's MMX (Multimedia Extension). This compiler would identify data parallel sections of the code using scalar and array dependence analysis. To enhance the scope for application of the subword semantics, our compiler performs several code transformations. These include strip mining, scalar expansion, grouping and reduction, and distribution. Thereafter inline assembly instructions corresponding to the data parallel sections are generated. We have used the Stanford University Intermediate Format (SUIF), a public domain compiler tool, for our implementation. We evaluated the performance of the code generated by our compiler for a number of benchmarks. Initial performance results reveal that our compiler generated code produces a reasonable performance improvement (speedup of 2 to 6.5) over the the code generated without the vectorizing transformations/inline assembly. In certain cases, the performance of the compiler generated code is within 85% of the hand-tuned code for MMX architecture.  相似文献   

3.
Traditionally a vectorizing compiler matches the iterative constructs of a program against a set of predefined templates. If a loop contains no dependency cycles then amaptemplate can be used; other simple dependencies can often be expressed in terms offoldorscantemplates. This paper addresses the template matching problem within the context of functional programming. A small collection of program identities are used to specify vectorizable for-loops. By incorporating these program identities within a monad,allwell-typed for-loops in which the body of the loop is expressed using thevectorization monadcan be vectorized. This technique enables the elimination of template matching from a vectorizing compiler, and the proof of the safety of vectorization can be performed by a type inference mechanism.  相似文献   

4.
软件的可信性很大程度上依赖于程序代码的可信性。影响软件可信性的主要因素包括来自软件内部的代码缺陷、代码错误、程序故障以及来自软件外部的病毒、恶意代码等,因此从代码角度来保证软件的可信性是实现可信软件的重要途径之一。编译器作为重要的系统软件之一,其可信性对整个计算机系统而言具有非常重要的意义。软件程序一般都需要经过编译器编译后方能执行,如果编译器不可信,则无法保证其所生成代码的可信性。本文主要讨论设计和实现可信编译器的主要思路和关键技术。  相似文献   

5.
In compiling applications for distributed memory machines, runtime analysis is required when data to be communicated cannot be determined at compile-time. One such class of applications requiring runtime analysis is block structured codes. These codes employ multiple structured meshes, which may be nested (for multigrid codes) and/or irregularly coupled (called multiblock or irregularly coupled regular mesh problems). In this paper, we present runtime and compile-time analysis for compiling such applications on distributed memory parallel machines in an efficient and machine-independent fashion. We have designed and implemented a runtime library which supports the runtime analysis required. The library is currently implemented on several different systems. We have also developed compiler analysis for determining data access patterns at compile time and inserting calls to the appropriate runtime routines. Our methods can be used by compilers for HPF-like parallel programming languages in compiling codes in which data distribution, loop bounds and/or strides are unknown at compile-time. To demonstrate the efficacy of our approach, we have implemented our compiler analysis in the Fortran 90D/HPF compiler developed at Syracuse University. We have experimented with a multi-bloc Navier-Stokes solver template and a multigrid code. Our experimental results show that our primitives have low runtime communication overheads and the compiler parallelized codes perform within 20% of the codes parallelized by manually inserting calls to the runtime library  相似文献   

6.
An introduction to a formal theory of dependence analysis   总被引:3,自引:1,他引:2  
Dependence analysis is a very important part of any vectorizing or concurrentizing compiler. This paper is an introduction to a formal theory of dependence analysis. The emphasis here is on rigor —the subject matter is not new. The program model is a Fortran do loop consisting of loops and assignment statements. We carefully explain the key dependence concepts and indicate through examples how the dependence tests work. These ideas and methods can be readily extended to more general programs.  相似文献   

7.
Andorra-I is an experimental parallel Prolog system which transparently exploits both dependent and-parallelism and or-parallelism. One of the main components of Andorra-I is its preprocessor. In order to obtain efficient execution of programs in Andorra-I, the preprocessor includes a compiler for Andorra-I. The compiler includes a determinacy analyser and a clause compiler, and generates code for a specialised abstract machine. In this paper we discuss the main issues in the Andorra-I compiler, presenting its abstract instruction set and describing the algorithms used in its implementation.  相似文献   

8.
Linear recurrences are the most important class of nonvectorizable problems in typical scientific/engineering calculations. This work discusses high-performance methods for solving first-order linear recurrences on a vector computer, investigates automatic transformations, and develops compiling techniques for first-order linear recurrence problems. The results show that the improved vector code generated by the vectorizing compiler on the HITAC S-820 supercomputer runs at the rate of 150 MFLOPS (million floating operations per second) for moderate loop lengths (>1000) and over 200 MFLOPS for long loop lengths (> 10000). Also, overall performance improvements of 69% in the 14 Lawrence Livermore Loops and 25 % in the 24 Lawrence Livermore Loops, as measured by the harmonic mean, are attained.  相似文献   

9.
覃安  符红光 《计算机应用》2005,25(9):2041-2043,2046
文中提出一种判定归一化的处理方法,这种方法把编译器在语法分析过程中遇到的递增和递减循环统一成一种模式处理,而循环的递增和递减交由目标生成模块来判断。通过这种方法,使得编译过程中的各模块的功能能进一步趋于均衡和简化,同时也为代码优化提供更多的优化条件。GiNaC是基于Linux的开放源码符号计算平台,这种方法应用在GiNaC编译器的设计中取得了很好的效果。  相似文献   

10.
一种有效的编译优化代码移动算法   总被引:1,自引:0,他引:1  
代码移动方法是编译程序全局优化的一个关键技术,本文将介绍一种新的代码移动算法,用此算法就可实现公共子表达式的删除和循环不变运算的移动,而且此算法无需检测循环控制结构,只要通过数据流分析就可实现代码移动,因此这种方法十分有效。  相似文献   

11.
In simultaneous multithreading (SMT) multiprocessors, using all the available threads (logical processors) to run a parallel loop is not always beneficial due to the interference between threads and parallel execution overhead. To maximize the performance of a parallel loop on an SMT multiprocessor, it is important to find an appropriate number of threads for executing the parallel loop. This article presents adaptive execution techniques that find a proper execution mode for each parallel loop in a conventional loop-level parallel program on SMT multiprocessors. A compiler preprocessor generates code that, based on dynamic feedbacks, automatically determines at run time the optimal number of threads for each parallel loop in the parallel application. We evaluate our technique using a set of standard numerical applications and running them on a real SMT multiprocessor machine with 8 hardware contexts. Our approach is general enough to work well with other SMT multiprocessor or multicore systems.  相似文献   

12.
一个有效的并行分析算法   总被引:3,自引:0,他引:3  
并行分析在并行编译系统中有着很重要的作用,它的优劣直接影响到编译系统的成败,随着机群系统及其并行开发环境的发展,多数的并行系统可支持多重并行循环的运行。而对只支持一重并行循环的编程系统,选择并行运行效率最高的循环,也是很重要的。为此,本文提出了一个有效的循环并行分析方案,它不但能给出多层循环的并行性,而且能够处理绝大部分实际应用中的并行性问题,本文对传统的并行分析算法进行修改,并给出了一个有效的并  相似文献   

13.
Irregular access patterns are a major problem for today’s optimizing compilers. In this paper, a novel approach will be presented that enables transformations that were designed for regular loop structures to be applied to linked list data structures. This is achieved by linearizing access to a linked list, after which further data restructuring can be performed. Two subsequent optimization paths will be considered: annihilation and sublimation, which are driven by the occurring regular and irregular access patterns in the applications. These intermediate codes are amenable to traditional compiler optimizations targeting regular loops. In the case of sublimation, a run-time step is involved which takes the access pattern into account and thus generates a data instance specific optimized code. Both approaches are applied to a sparse matrix multiplication algorithm and an iterative solver: preconditioned conjugate gradient. The resulting transformed code is evaluated using the major compilers for the x86 platform, GCC and the Intel C compiler.  相似文献   

14.
On modern computers, the performance of programs is often limited by memory latency rather than by processor cycle time. To reduce the impact of memory latency, the restructuring compiler community has developed locality-enhancing program transformations such as loop permutation and tiling. These transformations work well for perfectly nested loops (loops in which all assignment statements are contained in the innermost loop), but their performance on codes such as matrix factorizations that contain imperfectly nested loops leaves much to be desired. In this paper, we propose an alternative approach called data-centric transformation. Instead of reasoning directly about the control structure of the program, a compiler using the data-centric approach chooses an order for the arrival of data elements in the cache, determines what computations should be performed when that data arrives, and generates the appropriate code. At runtime, program execution will automatically pull data into the cache in an order that corresponds approximately to the order chosen by the compiler; since statements that touch a data structure element are scheduled close together, locality is improved. The idea of data-centric transformation is very general, and in this paper, we discuss a particular transformation called data-shackling. We have implemented shackling in the SGI MIPSPro compiler which already has a sophisticated implementation of control-centric transformations for locality enhancement. We present experimental results on the SGI Octane comparing the performance of the two approaches, and show that for dense numerical linear algebra codes, data-shackling does better by factors of two to five.  相似文献   

15.
Loop skewing is a new procedure to derive the wavefront method of execution of nested loops. The wavefront method is used to execute nested loops on parallel and vector computers when none of the loops can be done in vector mode. Loop skewing is a simple transformation of loop bounds and is combined with loop interchanging to generate the wavefront. This derivation is particularly suitable for implementation in compilers that already perform automatic detection of parallelism and generation of vector and parallel code, such as are available today. Loop normalization, a loop transformation used by several vectorizing translators, is related to loop skewing, and we show how loop normalization, applied blindly, can adversely affect the parallelism detected by these translators.  相似文献   

16.
17.
应用确认式编译技术解决移动代码的安全性问题是国际上新近开始研究的方法,其最大特点是把确保满足安全策略的主要任务由代码消费方转移到代码生产方,可以有效解决代码消费方运行时负担过重的问题;此外,它是对代码本身进行验证,而不是对代码产生方的身份进行验证,因而可信度更高并可以支持匿名代码。本文对该研究技术进行了分析,从中可了解到:支持更高级别的安全性是这种技术获得更广泛应用的焦点;并针对这种需求,在该文中穿插介绍了我们的工作设想,以期与同行分享。  相似文献   

18.
The paper presents approaches to the validation of optimizing compilers. The emphasis is on aggressive and architecture-targeted optimizations which try to obtain the highest performance from modern architectures, in particular EPIC-like micro-processors. Rather than verify the compiler, the approach of translation validation performs a validation check after every run of the compiler, producing a formal proof that the produced target code is a correct implementation of the source code.First we survey the standard approach to validation of optimizations which preserve the loop structure of the code (though they may move code in and out of loops and radically modify individual statements), present a simulation-based general technique for validating such optimizations, and describe a tool, VOC-64, which implements these technique. For more aggressive optimizations which, typically, alter the loop structure of the code, such as loop distribution and fusion, loop tiling, and loop interchanges, we present a set of permutation rules which establish that the transformed code satisfies all the implied data dependencies necessary for the validity of the considered transformation. We describe the necessary extensions to the VOC-64 in order to validate these structure-modifying optimizations.Finally, the paper discusses preliminary work on run-time validation of speculative loop optimizations, that involves using run-time tests to ensure the correctness of loop optimizations which neither the compiler nor compiler-validation techniques can guarantee the correctness of. Unlike compiler validation, run-time validation has not only the task of determining when an optimization has generated incorrect code, but also has the task of recovering from the optimization without aborting the program or producing an incorrect result. This technique has been applied to several loop optimizations, including loop interchange, loop tiling, and software pipelining and appears to be quite promising.  相似文献   

19.
We extend the well-known interval analysis method so that it can be used to gather global flow information for individual array elements. Data dependences between all array accesses in different basic blocks, different iterations of the same loop, and across different loops are computed and represented as labelled arcs in a program flow graph. This approach results in a uniform treatment of scalars and arrays in the compiler and builds a systematic basis from which the compiler can perform numerous global optimizations. This global dataflow analysis is performed as a separate phase in the compiler. This phase only gathers the global relationships between different accesses to a variable, yet the use of this information is left to the code generator. This organization substantially simplifies the engineering of an optimizing compiler and separates the back end of the compiler (e.g. code generator and register allocator) from the flow analysis part. The global dataflow analysis algorithm described in this paper has been implemented and used in an optimizing compiler for a processor with deep pipelines. This paper describes the algorithm and its compact implementation and evaluates it, both with respect to the accuracy of the information and to the compile-time cost of obtaining and using it.  相似文献   

20.
Although considerable technology has been developed for debugging and developing sequential programs, producing verifiably correct parallel code is a much harder task. In view of the large number of possible scheduling sequences, exhaustive testing is not a feasible method for determining whether a given parallel program is correct; nor have there been sufficient theoretical developments to allow the automatic verification of parallel programs. PTOOL, a tool being developed at Rice University in collaboration with users at Los Alamos National Laboratory, provides an alternative mechanism for producing correct parallel code. PTOOL is a semi-automatic tool for detecting implicit parallelism in sequential Fortran code. It uses vectorizing compiler techniques to identify dependences preventing the parallelization of sequential regions. According to the model supported by PTOOL, a programmer should first implement and test his program using traditional sequential debugging techniques. Then, using PTOOL, he can select loop bodies that can be safely executed in parallel. At Los Alamos, we have been interested in examining the role of dependence-analysis tools in the parallel programming process. Therefore, we have used PTOOL as a static debugging tool to analyze parallel Fortran programs. Our experiences using PTOOL lead us to conclude that dependence-analysis tools are useful to today's parallel programmers. Dependence-analysis is particularly useful in the development of asynchronous parallel code. With a tool like PTOOL, a programmer can guarantee that processor scheduling cannot affect the results of his parallel program. If a programmer wishes to implement a partially parallelized region through the use of synchronization primitives, however, he will find that dependence analysis is less useful. While a dependence-analysis tool can greatly simplify the task of writing synchronization code, the ultimate responsibility of correctness is left to the programmer.This work was performed under the auspices of the U.S. Department of Energy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号