首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
基于GUN Fortran编译器,设计并实现了co-array Fortran(CAF)编译器。通过源到源的转换将CAF代码转换为带有运行库调用的Fortran 90程序。典型用例的测试表明,CAF具有较好的可编程性,且CAF程序通过对数据分布的显式控制可获得比OpenMP程序更为高效的执行性能。  相似文献   

2.
This paper describes the design of the Fortran90D/HPF compiler, a source-to-source parallel compiler for distributed memory systems being developed at Syracuse University. Fortran 90D/HPF is a data parallel language with special directives to specify data alignment and distributions. A systematic methodology to process distribution directives of Fortran 90D/HPF is presented. Furthermore, techniques for data and computation partitioning, communication detection and generation, and the run-time support for the compiler are discussed. Finally, initial performance results for the compiler are presented. We believe that the methodology to process data distribution, computation partitioning, communication system design, and the overall compiler design can be used by the implementors of compilers for HPF.  相似文献   

3.
可信编译理论及其核心实现技术:研究综述   总被引:1,自引:0,他引:1       下载免费PDF全文
编译器是重要的系统软件之一,高级语言编写的软件都必须经过编译器的编译才能成为可执行程序。编译器的可信性对于整个计算机系统而言具有非常关键的意义,如果编译器不可信,则很难保证系统所运行软件的可信性。可信编译是指编译器在保证编译正确的同时提供相应的机制保证编译对象的可信性,对可信编译理论和技术的研究具有重要理论意义和实用前景。阐述了可信编译器的概念,介绍了编译过程正确性的形式化定义,对可信编译的主要研究进行了概括。在全面分析可信编译研究现状的基础上,从编译器自身可信性和确保编译对象可信性两个方面,对可信编译器设计和实现的相关理论和方法进行了分类和总结。最后,讨论了可信编译有待解决的问题和未来的研究方向。  相似文献   

4.
This paper presents the design and implementation of a parallelization framework and OpenMP runtime support in Intel® C++ & Fortran compilers for exploiting nested parallelism in applications using OpenMP pragmas or directives. We conduct the performance evaluation of two multimedia applications parallelized with OpenMP pragmas and compiled with the Intel C++ compiler on Hyper-Threading Technology (HT) enabled multiprocessor systems. The performance results show that the multithreaded code generated by the Intel compiler achieved a speedup up to 4.69 on 4 processors with HT enabled for five different input video sequences for the H.264 encoder workload, and a 1.28 speedup on an HT enabled single-CPU system and 1.99 speedup on an HT-enabled dual-CPU system for the audio–visual speech recognition workload. The performance gain due to exploiting nested parallelism for leveraging Hyper-Threading Technology is up to 70% for two multimedia workloads under different multiprocessor system configurations. These results demonstrate that hyper-threading benefits can be achieved by exploiting nested parallelism through Intel compiler and runtime system support for OpenMP programs.  相似文献   

5.
In compiling applications for distributed memory machines, runtime analysis is required when data to be communicated cannot be determined at compile-time. One such class of applications requiring runtime analysis is block structured codes. These codes employ multiple structured meshes, which may be nested (for multigrid codes) and/or irregularly coupled (called multiblock or irregularly coupled regular mesh problems). In this paper, we present runtime and compile-time analysis for compiling such applications on distributed memory parallel machines in an efficient and machine-independent fashion. We have designed and implemented a runtime library which supports the runtime analysis required. The library is currently implemented on several different systems. We have also developed compiler analysis for determining data access patterns at compile time and inserting calls to the appropriate runtime routines. Our methods can be used by compilers for HPF-like parallel programming languages in compiling codes in which data distribution, loop bounds and/or strides are unknown at compile-time. To demonstrate the efficacy of our approach, we have implemented our compiler analysis in the Fortran 90D/HPF compiler developed at Syracuse University. We have experimented with a multi-bloc Navier-Stokes solver template and a multigrid code. Our experimental results show that our primitives have low runtime communication overheads and the compiler parallelized codes perform within 20% of the codes parallelized by manually inserting calls to the runtime library  相似文献   

6.
This paper describes two new ideas by which a High Performance Fortran compiler can deal with irregular computations effectively. The first mechanism invokes a user specified mapping procedure via a set of proposed compiler directives. The directives allow use of program arrays to describe graph connectivity, spatial location of array elements, and computational load. The second mechanism is a conservative method for compiling irregular loops in which dependence arises only due to reduction operations. This mechanism in many cases enables a compiler to recognize that it is possible to reuse previously computed information from inspectors (e.g., communication schedules, loop iteration partitions, and information that associates off-processor data copies with on-processor buffer locations). This paper also presents performance results for these mechanisms from a Fortran 90D compiler implementation  相似文献   

7.
黄春 《计算机科学》2012,39(1):287-289,304
Co-Array Fortran(CAF)已经成为Fortran语言标准的一部分,在科学计算领域逐渐被接受。基于软件共享存储实现了一个CAF编译器,其通过直接的数组赋值实现Co-array数据通信,利用数据垫塞技术提高数据局部性,减少伪共享,优化CAF程序性能。典型科学计算程序测试表明,CAF能够获得和MPI相当的性能。  相似文献   

8.
Co-Array Fortran, formally called F––, is a small set of extensions to Fortran 90/95 for Single-Program-Multiple-Data (SPMD) parallel processing. OpenMP Fortran is a set of compiler directives that provide a high level interface to threads in Fortran, with both thread-local and thread-shared memory. OpenMP is primarily designed for loop-level directive-based parallelization, but it can also be used for SPMD programs by spawning multiple threads as soon as the program starts and having each thread then execute the same code independently for the duration of the run. The similarities and differences between these two SPMD programming models are described.Co-Array Fortran can be implemented using either threads or processes, and is therefore applicable to a wider range of machine types than OpenMP Fortran. It has also been designed from the ground up to support the SPMD programming style. To simplify the implementation of Co-Array Fortran, a formal Subset is introduced that allows the mapping of co-arrays onto standard Fortran arrays of higher rank. An OpenMP Fortran compiler can be extended to support Subset Co-Array Fortran with relatively little effort.  相似文献   

9.
一种HPF编译系统的研究与实现*   总被引:8,自引:1,他引:8  
HPF(high performance Fortran)是一种典型的数据并行语言,HPF编译系统的实现是并行计算研究领域的一个难点.文章介绍了一个HPF编译系统的研究与实现情况,在对该系统的主要组成进行了简要介绍之后,着重讨论了系统实现中的若干关键技术,并列出了部分HPF源程序及其编译器生成的相应代码,最后给出了对该编译器的一些性能测试结果和有关问题的讨论.  相似文献   

10.
In many scientific applications, arrays containing data are indirectly indexed through indirection arrays. Such scientific applications are called irregular programs and are a distinct class of applications that require special techniques for parallelization. This paper presents a library called CHAOS, which helps users implement irregular programs on distributed-memory message-passing machines, such as the Paragon, Delta, CM-5 and SP-1. The CHAOS library provides efficient runtime primitives for distributing data and computation over processors; it supports efficient index translation mechanisms and provides users high-level mechanisms for optimizing communication. CHAOS subsumes the previous PARTI library and supports a larger class of applications. In particular, it provides efficient support for parallelization of adaptive irregular programs where indirection arrays are modified during the course of computation. To demonstrate the efficacy of CHAOS, two challenging real-life adaptive applications were parallelized using CHAOS primitives: a molecular dynamics code, CHARMM, and a particle-in-cell code, DSMC. Besides providing runtime support to users, CHAOS can also be used by compilers to automatically parallelize irregular applications. This paper demonstrates how CHAOS can be effectively used in such a framework. By embedding CHAOS primitives in the Syracuse Fortran 90D/HPF compiler, kernels taken from the CHARMM and DSMC codes have been automatically parallelized.  相似文献   

11.
This paper describes the results of the research for implementing applications for concurrent execution on the CYBERPLUS multiparallel computer system. Three layers of parallelism are built into this system: instruction, computation and functional levels. The paper contains a short summary of the hardware, and the different layers of the software for multiprocessor systems. They are based on the previously developed CPFTN (Cyberplus Fortran Compiler) compiler which produces optimized object code for a single CYBERPLUS processor, exploiting through a series of horizontal and vertical optimizations the internal parallelism present in this processor. Unlike other compilers, the input to CPFTN consists of an entire kernel containing all subroutines and functions loaded into a single processor. An interprocessor communications subsystem provides the low-level component of the multiprocessor system and contains the data transfer and synchronization capabilities. The high-level component consists of extensions to FORTRAN (CPMFTN, Cyberplus Multiprocessor Fortran) for characterizing the data shared between processors and implements concepts specially adapted to the private memory architecture of the CYBERPLUS as well as to the specific needs of the user community targetted by this product. The resulting scheme is a large grain size, demand driven, data flow type. It is assumed that partition between processors can best be done on the bases of the high-level knowledge of the problem possessed by the user. The corresponding kernels may be created independently of each other, using the new language features introduced to characterize data shared between processors. Scheduling, synchronization, as well as data transfer operations are automatically distributed by the compiler across the subroutines and functions constituting the grain size of the partition. The first version of CPMFTN under development is described followed by several simple application paradigms and discussion of the merits of possible future extensions.  相似文献   

12.
UPC并行循环优化的研究与实现   总被引:2,自引:0,他引:2  
UPC(UnifiedParallelC)是一种新型的基于全局地址空间(GlobalAddressSpace,简称GAS)访问的并行编程语言,支持SPMD(SingleProgramMulti-Data)编程模式。论文主要研究UPC原型系统的编译器优化技术的算法与实现,该UPC原型系统是建立在开放源码的BerkeleyUPC编译器基础之上的。目前该原型系统已实现了upc_forall优化和共享访问私有化,使得一部分UPC并行应用程序的效率得到了明显改善。  相似文献   

13.
The stream architecture is a novel microprocessor architecture with wide application potential. It is critical to study how to use the stream architecture to accelerate scientific computing programs. However, existing stream processors and stream programming languages are not designed for scientific computing. To address this issue, we design and implement a 64-bit stream processor, Fei Teng 64 (FT64), which has a peak performance of 16 Gflops. FT64 supports two kinds of communications, message passing and stream communications, based on which, an interconnection architecture is designed for a FT64-based high-performance computer. This high-performance computer contains multiple modules, with each module containing eight FT64s. We also design a novel stream programming language, Stream Fortran 95 (SF95), together with the compiler SF95Compiler, so as to facilitate the development of scientific applications. We test nine typical scientific application kernels on our FT64 platform to evaluate this design. The results demonstrate the effectiveness and efficiency of FT64 and its compiler for scientific computing.  相似文献   

14.
High Performance Fortran (HPF) is a data-parallel language that provides a high-level interface for programming scientific applications, while delegating to the compiler the task of generating explicitly parallel message-passing programs. This paper provides an overview of HPF compilation and runtime technology for distributed-memory architectures, and deals with a number of topics in some detail. In particular, we discuss distribution and alignment processing, the basic compilation scheme and methods for the optimization of regular computations. A separate section is devoted to the transformation and optimization of independent loops with irregular data accesses. The paper concludes with a discussion of research issues and outlines potential future development paths of the language.  相似文献   

15.
The Portable Parallelizing Fortran Compiler (PPFC) is an additional component for the portable programming environment developed in Tel-Aviv University for scientific code. This environment supports portable and efficient programming of diverse MIMD multiprocessors, both distributed- and shared-memory. Till now this environment has consisted of two tools: the Virtual Machine for MultiProcessors (VMMP) and the Portable Parallelizing Pascal compiler (P3C). We have added the PPFC which is an automatic parallelizer compiler for the Fortran language. The compiler is fully automatic (does not require additional declarations to assist parallelization), which is characterized by loops operating on regular data structures, and produces efficient and portable code for a variety of multiprocessors from the same serial code. The parallel implementation uses the VMMP, which is a software package that provides a coherent set of services for explicitly parallel application programs running on diverse MIMD multiprocessors. VMMP is intended to simplify parallel program writing and to promote portable and efficient programming. The PPFC parallelized 12 out of the 24 Livermore Loops. It was also applied to parallelize all the 14 Fortran application programs that where parallelized by the P3C and achieved the same speed-ups and efficiencies. In most examples the PPFC achieved high speed-ups and efficiencies on all target multiprocessors. The PPFC emphasizes efficiency and code portability. Although PPFC employs a relatively simple data flow analysis, it produces efficient code for various widely used application programs.  相似文献   

16.
The article discusses attitudes about “automatic programming”, the economics of programming, and existing programming systems, all in the early 1950s. It describes the formation of the Fortran group, its knowledge of existing systems, its plans for Fortran, and the development of the language in 1954. It describes the development of the optimizing compiler for Fortran I, of various language manuals, and of Fortran II and III. It concludes with remarks about later developments and the impact of Fortran and its successors on programming today  相似文献   

17.
Historically, the principal achievement of compiler technology has been to make it possible to program in a high-level, machine-independent style. The absence of compiler technology to provide such a style for parallel computers is the main reason these systems have not found widespread acceptance. This paper examines the prospects for machine-independent parallel programming, concentrating on Fortran D and High Performance Fortran, which support machine-independent expression of “data parallelism.”  相似文献   

18.
支持多语种、多平台是当前编译技术的发展趋势,它适应了当前计算机系统迅猛更新的需要。文中提出了一个适用于不同计算机平台的多平台Ada95编译系统的模型,在此基础上讨论了Ada95编译系统中多平台支持的三个关键技术:平台配置机制、运行库多平台支持技术和多平台代码生成支持策略,并描述了相应的设计与实现方法。  相似文献   

19.
Array statements as included in Fortran 90 or High Performance Fortran (HPF) are a well-accepted way to specify data parallelism in programs. When generating code for such a data parallel program for a private memory parallel system, the compiler must determine when array elements must be moved from one processor to another. This paper describes a practical method to compute the set of array elements that are to be moved; it covers all the distributions that are included in HPF: block, cyclic, and block-cyclic. This method is the foundation for an efficient protocol for modern private memory parallel systems: for each block of data to be sent, the sender processor computes the local address in the receiver′s address space, and the address is then transmitted together with the data. This strategy increases the communication load but reduces the overhead on the receiving processor. We implemented this optimization in an experimental Fortran compiler, and this paper reports an empirical evaluation on a 64-node private memory iWarp system, using a number of different distributions.  相似文献   

20.
CHAU-WEN TSENG 《Software》1997,27(7):763-796
Fortran D is a version of Fortran enhanced with data decomposition specifications. Case studies illustrate strengths and weaknesses of the prototype Fortran D compiler when compiling linear algebra codes and whole programs. Statement groups, execution conditions, inter-loop communication optimizations, multi-reductions, and array kills for replicated arrays are identified as new compilation issues. On the Intel iPSC/860, the output of the prototype Fortran D compiler approaches the performance of hand-optimized code for parallel computations, but needs improvement for linear algebra and pipelined codes. The Fortran D compiler outperforms the CM Fortran compiler (2.1 beta) by a factor of four or more on the TMC CM-5 when not using vector units. Its performance is comparable to the DEC and IBM HPF compilers on a Alpha cluster and SP-2. Better analysis, run-time support, and flexibility are required for the prototype compiler to be useful for a wider range of programs. © 1997 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号