期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Evaluation of fortran vector compilers and preprocessors

Glenn Luecke Waqar Haque James Hoekstra Howard Jespersen James Coyle 《Software》1991,21(9):891-905

Many scientific codes can achieve significant performance improvement when executed on a computer equipped with a vector processor. Vector constructs in source code should be recognized by a vectorizing compiler or preprocessor. This paper discusses, from a general point of view, how a vectorizing compiler/preprocessor can be evaluated. The areas discussed include data dependence analysis, IF loop analysis, nested loops, loop interchanging, loop collapsing, indirect addressing, use of temporary storage, and order of arithmetic. The ideas presented are based on vectorization of over a million lines of production codes and an extensive test suite developed to evaluate preprocessors under varying degrees of code complexity. Areas for future research are also discussed. 相似文献

2.

V-Pascal: An automatic vectorizing compiler for Pascal with no language extensions

Takao Tsuda Yoshitoshi Kunieda 《The Journal of supercomputing》1990,4(3):251-275

An automatic vectorizing compiler called V-Pascal is described in detail. The compiler has been designed and implemented with a view to vectorizing Pascal source programs. Using the mechanism of vector indirect addressing, it reduces multiply nestedfor loops to equivalent single loops, which are then executed by vector mode with sufficiently long vector lengths. TheD matrix, which is an adjacency matrix giving dependences between intermediate code nodes, plays an important role in the V-Pascal compiler. It is demonstrated that, in some cases, the V-Pascal compiler yields object code that runs faster than the Fortran counterpart. This paper mainly presents the basic constituents of the Version 1 of the V-Pascal compiler. Version 2 includes higher functions such as vectorization ofwhile-do loops and recursive procedures, vectorization of character string manipulations and relational database operations (written in Pascal), and automatic parallel decomposition for multiprocessor environments. 相似文献

3.

An introduction to a formal theory of dependence analysis 总被引：3，自引：1，他引：2

Utpal Banerjee 《The Journal of supercomputing》1988,2(2):133-149

Dependence analysis is a very important part of any vectorizing or concurrentizing compiler. This paper is an introduction to a formal theory of dependence analysis. The emphasis here is on rigor —the subject matter is not new. The program model is a Fortran do loop consisting of loops and assignment statements. We carefully explain the key dependence concepts and indicate through examples how the dependence tests work. These ideas and methods can be readily extended to more general programs. 相似文献

4.

A Vectorizing Compiler for Multimedia Extensions 总被引：6，自引：0，他引：6

N. Sreraman R. Govindarajan 《International journal of parallel programming》2000,28(4):363-400

In this paper, we present an implementation of a vectorizing C compiler for Intel's MMX (Multimedia Extension). This compiler would identify data parallel sections of the code using scalar and array dependence analysis. To enhance the scope for application of the subword semantics, our compiler performs several code transformations. These include strip mining, scalar expansion, grouping and reduction, and distribution. Thereafter inline assembly instructions corresponding to the data parallel sections are generated. We have used the Stanford University Intermediate Format (SUIF), a public domain compiler tool, for our implementation. We evaluated the performance of the code generated by our compiler for a number of benchmarks. Initial performance results reveal that our compiler generated code produces a reasonable performance improvement (speedup of 2 to 6.5) over the the code generated without the vectorizing transformations/inline assembly. In certain cases, the performance of the compiler generated code is within 85% of the hand-tuned code for MMX architecture. 相似文献

5.

基于编译指示的向量化方法

下载免费PDF全文

姚远赵荣彩《计算机工程》2012,38(12):272-275

编译器由于程序分析能力不足,无法自动实现循环向量化或者会造成盲目自动向量化。为此,提出一种基于编译指示的向量化方法。通过在代码中插入向量化编译指示语句,指导自动向量化编译工具的处理过程,自动生成高效的向量化代码。测试结果表明,该方法能够有效提高目标代码的运行性能。相似文献

6.

POET: a scripting language for applying parameterized source‐to‐source program transformations

Qing Yi 《Software》2012,42(6):675-706

相似文献

7.

基于Profile信息的连续性分析算法及其优化

下载免费PDF全文

姚远赵荣彩《计算机工程》2012,38(9):28-31

在Open64编译框架基础上,提出一种基于Profile信息的循环内数据访问连续性分析算法及其向量化优化方法。采用反馈式编译优化技术,获取程序运行时的连续性Profile信息,通过结构体剥离和数据重组方法实现程序向量化。实验结果表明,该算法针对不规则程序代码,可提供更精确的向量化信息,提高程序的向量化程度。相似文献

8.

自动向量化中基于数据依赖分析的循环分布算法

黄磊姚远侯永生杨明《计算机科学》2011,38(9):288-293

循环分布是开发向量化程序的一个有效的方法。但是由于程序中的数据相关性,当前的自动向量化编译器实现完全的循环分布非常困难。因此,当前的自动向量化编译器一般采用简单的循环分布方法。以数据依赖关系分析为基础,从有无依赖环的角度分析了程序中语句的向量化能力,提出了基于语句向量化识别的循环分布算法,并在自动向量化中加以实现。通过此方法,可以充分地分析语句或依赖环的向量化能力,最终采用循环分布,将可向量化的语句与不可向量化的语句分布在不同的循环中。该方法可以处理当前的自动向量化编译器无法向量化的循环,对一些语句间有依赖关系的循环可达到较好的效果。相似文献

9.

面向部分向量化的循环分布及聚合优化

韩林徐金龙李颖颖王阳《计算机科学》2017,44(2):70-74, 81

大量循环中都存在着少数无法向量化的语句以及许多可向量化语句,循环分布通常可以将这些语句分离到不同的循环中,进而实现循环的部分向量化。目前主流的优化编译器仅支持简单激进的循环分布方法,因而导致向量化后的循环开销过大,且不利于寄存器和cache的重用。针对上述问题,提出了面向部分向量化的循环分布及聚合方法。首先,分析了一般循环分布的两个关键问题:语句集的划分和循环执行顺序的确定;其次,提出了面向最大聚合的凝聚图结点排序方法来指导循环合并,在不影响并行性的前提下减小了循环开销;最后,通过实验对提出的方法进行了验证。实验结果表明,对于测试用例,提出的方法能够生成正确的向量化代码,并且能够显著提高向量化程序的执行效率。相似文献

10.

Alias analysis of pointers in Pascal and Fortran 90: dependence analysis between pointer references

Aki Matsumoto D. S. Han Takao Tsuda 《Acta Informatica》1996,33(2):99-130

Vectorization and parallelization of programs written in languages where pointers are used is now a subject of increasing interest. The presence of pointers in programs, however, poses new problems to dependence analysis in vectorizing and parallelizing compilers which had been designed to target only at FORTRAN77 programs. In this paper, a new method to analyze dependencies between pointer references in Pascal is proposed, which can also be applied to Fortran 90. It is designed to handle programs with dynamic data structures, such as linear linked lists or trees, which are the most common use of pointers. The method divides into two stages. The first stage is a safe alias analysis which handles any kind of dynamic data structures. The second stage focuses on the specific data structures. It first detects linear linked lists, and then performs dependence analysis between pointer references to the same list. The paper also proposes ways to enhance the second stage. Tree structures are handled here. Loops which manipulate linked lists can now be considered for vectorization by the proposed analysis. Techniques to vectorize such loops are presented in this paper. Some of the proposed algorithms are implemented in V-Pascal, the automatic vectorizing Pascal compiler of our laboratory. The effectiveness of the vectorization of list operations is proved by an experiment on HITAC S-820/80. Received: August 5, 1994/January 17, 1995 相似文献

11.

A Polynomial-Time Algorithm for Memory Space Reduction

Yonghong?Song Email author Cheng?Wang Zhiyuan?Li 《International journal of parallel programming》2005,33(1):1-33

Reducing memory space requirement is important to many applications. For data-intensive applications, it may help avoid executing the program out-of-core. For high-performance computing, memory space reduction may improve the cache hit rate as well as performance. For embedded systems, it can reduce the memory requirement, the memory latency and the energy consumption. This paper investigates program transformations which a compiler can use to reduce the memory space required for storing program data. In particular, the paper uses integer programming to model the problem of combining loop shifting, loop fusion and array contraction to minimize the data memory required to execute a collection of multi-level loop nests. The integer programming problem is then reduced to an equivalent network flow problem which can be solved in polynomial time. 相似文献

12.

面向间接数组索引的向量化方法

姚金阳赵荣彩王琦李颖颖《计算机科学》2018,45(9):220-223, 236

对现有的编译器而言,间接数组索引不能被高效地向量化,这使得程序中包含有该类访存形式的间接数组索引不能利用SIMD扩展部件,这也是程序向量化研究中的热点问题。为了高效地利用SIMD扩展部件,充分挖掘程序中的向量化潜能,提出了一种对间接数组索引进行向量化的新方法,且提供了性能收益方法,分别对各种间接数组索引进行性能收益分析。实验结果表明,使用该向量化方法可以显著地提高程序的执行效率。相似文献

13.

GCC非满载SLP向量化

刘浩浩韩林崔平非《计算机系统应用》2022,31(9):265-271

随着向量长度的不断增长, SIMD扩展部件得以处理更为庞大的数据级并行, 但程序的并行阈值也随之提高. 对于现有的自动向量化编译器, 如果在分析阶段不能从串行代码中发掘出足够的数据级并行以完全填充向量寄存器, 则不会进入相应的向量代码变换阶段, 从而无法向量化. 较长的向量长度使得某些并行性不足的程序失去了向量化的机会, 造成了性能下降. 为了更加充分的利用SIMD部件, 介绍了一种面向基本块的非满载向量化方法ISLP. 基于开源GCC编译器, 从并行性检测、代码生成和代价模型3个方面详细阐述了ISLP的设计与实现. 在标准测试集上的实验结果表明, 该方法可以有效地对超字级并行性不足的程序进行向量化处理, 提高程序执行效率. 选取的测试用例在向量化后的平均加速比达到1.14, 性能较常规SLP方法提升11.8%. 相似文献

14.

高级向量优化技巧的实践与经验

高念书张兆庆《计算机学报》1992,15(9):676-684

本文讨论了将油藏数值模拟中某大型FORTRAN程序(RSP)在向量机KJ89 20上做向量化时所使用的若干技巧.文中首次分析了一类对称型语句间的伪依赖性,提出了引入临时数组的实用方案,并讨论了向量化逻辑IF和转移语句等技巧. 相似文献

15.

基于模板聚类与综合的多模板快速定位算法

韦燕凤彭思龙《中国图象图形学报》2004,9(3):314-317

针对在一幅图像中定位多个模板的所有实例的情形，提出了一种基于多模板聚类和综合的快速目标定位方法。该方法首先使用带反馈的分级聚类算法对多模板进行聚类，并对每类模板用建立的数学模型综合出一个母板；然后，应用每类的母板在平移空间内搜索和匹配，且只在与母板相匹配的那些位置上才引导类内各子模板在该位置的匹配运算，最后用该算法对边缘图像进行了聚类、综合和匹配实验。实验结果表明，该算法在集成电路显微图像的多模板定位中是非常有效的。相似文献

16.

Translation and Run-Time Validation of Optimized Code

Lenore Zuck Amir Pnueli Yi Fang Benjamin Goldberg Ying Hu 《Electronic Notes in Theoretical Computer Science》2002,70(4):179-200

The paper presents approaches to the validation of optimizing compilers. The emphasis is on aggressive and architecture-targeted optimizations which try to obtain the highest performance from modern architectures, in particular EPIC-like micro-processors. Rather than verify the compiler, the approach of translation validation performs a validation check after every run of the compiler, producing a formal proof that the produced target code is a correct implementation of the source code.First we survey the standard approach to validation of optimizations which preserve the loop structure of the code (though they may move code in and out of loops and radically modify individual statements), present a simulation-based general technique for validating such optimizations, and describe a tool, VOC-64, which implements these technique. For more aggressive optimizations which, typically, alter the loop structure of the code, such as loop distribution and fusion, loop tiling, and loop interchanges, we present a set of permutation rules which establish that the transformed code satisfies all the implied data dependencies necessary for the validity of the considered transformation. We describe the necessary extensions to the VOC-64 in order to validate these structure-modifying optimizations.Finally, the paper discusses preliminary work on run-time validation of speculative loop optimizations, that involves using run-time tests to ensure the correctness of loop optimizations which neither the compiler nor compiler-validation techniques can guarantee the correctness of. Unlike compiler validation, run-time validation has not only the task of determining when an optimization has generated incorrect code, but also has the task of recovering from the optimization without aborting the program or producing an incorrect result. This technique has been applied to several loop optimizations, including loop interchange, loop tiling, and software pipelining and appears to be quite promising. 相似文献

17.

发掘函数级单指令多数据向量化的方法

李颖颖高伟高雨辰翟胜伟李朋远《计算机应用》2017,37(8):2200-2208

当前面向单指令多数据（SIMD）扩展部件的两类向量化方法分别是循环级向量化方法和超字级并行（SLP）方法。针对当前编译器不能实现函数级向量化的问题,提出一种基于静态单赋值的函数级向量化方法。该方法首先分析程序的变量属性,然后利用一组包括向量函数子句、一致子句、线性子句等编译指示子句指导编译器实现函数级向量化,最后利用变量属性结果对向量化代码进行了优化。从多媒体和图像处理领域选择部分测试用例对所提的函数级向量化的功能和性能在国产申威平台上进行测试,与程序串行执行相比,采用函数级向量化后程序的执行效率更高。实验结果表明函数级向量化可以取得类似任务级并行的加速效果,该方法可以指导自动函数级向量化的实现。相似文献

18.

Nonorthogonal Image Expansion Related to Optimal Template Matching in Complex Images

《CVGIP: Graphical Models and Image Processing》1994,56(2):149-160

Expansion matching (EXM) is a novel method for template matching that optimizes a new similarity measure called discriminative signal-to-noise ratio (DSNR). Since EXM is designed to minimize off-center response, it yields results with very sharp matching peaks. EXM yields superior performance to the widely used correlation matching (also known as matched filtering), especially in conditions of noise, superposition, and severe occlusion. This paper presents an extended EXM formulation that matches multiple templates in the complex image domain. Complex template matching is useful in matching frequency domain templates and edge gradient images, and can be extended to multispectral images as well. Here, a single filter is designed to simultaneously match a set of given complex templates with optimal DSNR, while eliciting user-defined center responses for each template. It is shown that when the complex case is simplified to the case of matching a single real template, the result reduces exactly to the minimum squared error (MSE) restoration filter assuming the template as the blurring function. Here, we introduce a new generalized MSE restoration paradigm based on the analogy to multiple-template EXM. Furthermore, the output of the single-template EXM filter is also shown to be equivalent to a nonorthogonal expansion of the image with basis functions that are all shifted versions of the template. Experimental results prove that EXM is robust to minor rotation and scale distortions. 相似文献

19.

一种针对多媒体扩展指令集和实际多媒体程序的自动向量化方法

姜伟华梅超郭一朱嘉华臧斌宇朱传琪《计算机学报》2005,28(8):1255-1266

自动向量化编译是利用处理器的多媒体扩展指令集提升多媒体程序性能的理想工具．但目前的研究不能有效加速实际程序．其主要原因是：普通算术操作的向量化不一定有性能提升;而多媒体典型操作因为其在源代码中表现形式多样而不能充分向量化．为了解决这一问题,文章对经典向量化算法进行改进以灵活统一地向量化这两类操作．主要的改进是增加了两个步骤：统一操作的不同表现形式和识别有价值的向量化操作．改进后的算法可以充分利用指令集生成高效代码,从而对实际多媒体程序拥有良好效果．此外,该算法可扩展性也很强．相似文献

20.

Feedback-directed specialization of code

Minhaj Ahmad Khan 《Computer Languages, Systems and Structures》2010,36(1):2-15

Based on feedback information, a large number of optimizations can be performed by the compiler. This information actually indicates the changing behavior of the applications and can be used to specialize code accordingly.Code specialization is a way to facilitate the compiler to perform optimizations by providing the information regarding variables in the code. It is however difficult to select the variables which maximize the benefit of specialization. Also the overhead of specialization and code size increase are the main issues while specializing code.This paper suggests a novel method for improving the performance using specialization based on feedback information and analysis. The code is iteratively specialized after selecting candidate variables by using a heuristic, followed by generation of optimized templates. These templates require a limited set of instructions to be specialized at runtime and are valid for a large number of values. The overhead of runtime specialization is further minimized through optimal software cache of template clones whose instantiation can be performed at static compile time.The experiments have been performed on Itanium-II(IA-64) and Pentium-IV processors using icc and gcc compilers. A significant improvement in terms of execution speed and reduction of code size has been achieved for SPEC and FFTW benchmarks. 相似文献