首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Optimizing communication is a key issue in generating efficient SPMD codes in compiling distributed arrays on data parallel languages, such as High Performance Fortran. In HPF, the array distribution may involve alignment and cyclic(k)-distribution such that the enumeration of the local set and the enumeration of the communication set exhibit regular patterns which can be modeled as integer lattices. In the special case of unit-strided alignment, many techniques of the communication set enumeration have been proposed, while in the general case of the non-unit-strided alignment, inspector-like run-time codes are needed to build repeating pattern table or to scan over local elements such that the communication set can be constructed. Unlike other works on this problem of the general alignment and cyclic(k) distribution, our approach derives an algebraic solution for such an integer lattice that models the communication set by using the Smith-Normal-Form analysis, therefore, efficient enumeration of the communication set can be generated. Based on the integer lattice, we also present our algorithm for the SPMD code generation. In our approach, when the parameters are known, the SPMD program can be efficiently constructed without any inspector-like run-time codes  相似文献   

2.
High Performance Fortran (HPF) is a data-parallel language that provides a high-level interface for programming scientific applications, while delegating to the compiler the task of generating explicitly parallel message-passing programs. This paper provides an overview of HPF compilation and runtime technology for distributed-memory architectures, and deals with a number of topics in some detail. In particular, we discuss distribution and alignment processing, the basic compilation scheme and methods for the optimization of regular computations. A separate section is devoted to the transformation and optimization of independent loops with irregular data accesses. The paper concludes with a discussion of research issues and outlines potential future development paths of the language.  相似文献   

3.
Vienna Fortran, High Performance Fortran (HPF), and other data parallel languages have been introduced to allow the programming of massively parallel distributed-memory machines (DMMP) at a relatively high level of abstraction, based on the SPMD paradigm. Their main features include directives to express the distribution of data and computations across the processors of a machine. In this paper, we use Vienna-Fortran as a general framework for dealing with sparse data structures. We describe new methods for the representation and distribution of such data on DMMPs, and propose simple language features that permit the user to characterize a matrix as “sparse” and specify the associated representation. Together with the data distribution for the matrix, this enables the complier and runtime system to translate sequential sparse code into explicitly parallel message-passing code. We develop new compilation and runtime techniques, which focus on achieving storage economy and reducing communication overhead in the target program. The overall result is a powerful mechanism for dealing efficiently with sparse matrices in data parallel languages and their compilers for DMMPs  相似文献   

4.
Fortran D is a version of Fortran extended with data decomposition specifications. It is designed to provide a machine-independent programming model for data-parallel applications and has heavily influenced the design of High Performance Fortran (HPF). In previous work we described Fortran D compilation algorithms for individual procedures. This paper presents an interprocedural approach to analyze data and computation partitions, optimize communication, support dynamic data decomposition, and perform other tasks required to compile Fortran D programs. Our algorithms are designed to make interprocedural compilation efficient. First, we collect summary information after edits to solve important data-flow problems in a separate interprocedural propagation phase. Second, for nonrecursive programs we compile procedures in reverse topological order to propagate additional interprocedural information during code generation. We thus limit compilation to a single pass over each procedure body. We also perform optimizations across procedure boundaries by delaying instantiation of the computation partition, communication, and dynamic data decomposition. Empirical results show that interprocedural optimization is crucial in achieving acceptable performance for a common application code.  相似文献   

5.
GAGAN AGRAWAL  JOEL SALTZ 《Software》1997,27(5):519-545
Data parallel languages like High Performance Fortran (HPF) are emerging as the architecture independent mode of programming distributed memory parallel machines. In this paper, we present the interprocedural optimizations required for compiling applications having irregular data access patterns, when coded in such data parallel languages. We have developed an Interprocedural Partial Redundancy Elimination (IPRE) algorithm for optimized placement of runtime preprocessing routine and collective communication routines inserted for managing communication in such codes. We also present two new interprocedural optimizations: placement of scatter routines and use of coalescing and incremental routines. We then describe how program slicing can be used for further applying IPRE in more complex scenarios. We have done a preliminary implementation of the schemes presented here using the Fortran D compilation system as the necessary infrastructure. We present experimental results from two codes compiled usng our system to demonstrate the efficacy of the presented schemes. ©1997 John Wiley & Sons, Ltd.  相似文献   

6.
Overlapping communication with computation is a well-known approach to improving performance. Previous research has focused on optimizations performed by the programmer. This paper presents a compiler algorithm that automatically determines the appropriate loop indices of a given nested loop and applies loop interchange and tiling in order to overlap communication with computation. The algorithm avoids generating redundant communication by providing a framework for combining information on data dependence, communication, and reuse. It also describes a method of generating messages to exchange data between processors for tiled loops on distributed memory machines. The algorithm has been implemented in our High Performance Fortran (HPF) compiler, and experimental results have shown its effectiveness on distributed memory machines, such as the RISC System/6000 Scalable POWERparallel System. This paper also discusses the architectural problems of efficient optimization.  相似文献   

7.
8.
The ParaScope Editor is a new kind of interactive parallel programming tool for developing scientific Fortran programs. It assists the knowledgeable user by displaying the results of sophisticated program analyses and by providing editing and a set of powerful interactive transformations. After an edit or parallelism-enhancing transformation, the ParaScope Editor incrementally updates both the analyses and source quickly. This paper describes the underlying implementation of the ParaScope Editor, paying particular attention to the analysis and representation of dependence information and its reconstruction after changes to the program.  相似文献   

9.
Adaptive grid refinement in Fortran (AGRIF) is a Fortran90 package for the integration of adaptive mesh refinement (AMR) features within existing finite difference codes. The package first provides model-independent Fortran90 procedures containing the different operations in an AMR process: time integration of grid hierarchy, clustering, interpolations, updates, etc. The package then creates the Fortran90 model-dependent part of the code based on an entry file written by the user.The basic idea of AGRIF is to make use of Fortran90 pointers to successively address the variables of the different grids of an AMR process. As pointers can be used exactly like other (static) variables in Fortran, most of the original code will remain unchanged.  相似文献   

10.
Data parallel languages, like High Performance Fortran (HPF), support the notion of distributed arrays. However, the implementation of such distributed array structures and their access on message passing computers is not straightforward. This holds especially for distributed arrays that are aligned to each other and given a block-cyclic distribution. In this paper, an implementation framework is presented for HPF distributed arrays on message passing computers. Methods are presented for efficient (in space and time) local index enumeration, local storage, and communication. Techniques for local set enumeration provide the basis for constructing local iteration sets and communication sets. It is shown that both local set enumeration and local storage schemes can be derived from the same equation. Local set enumeration and local storage schemes are shown to be orthogonal, i.e., they can be freely combined. Moreover, for linear access sequences generated by our enumeration methods, the local address calculations can be moved out of the enumeration loop, yielding efficient local memory address generation. The local set enumeration methods are implemented by using a relatively simple general transformation rule for absorbing ownership tests. This transformation rule can be repeatedly applied to absorb multiple ownership tests. Performance figures are presented for local iteration overhead, a simple communication pattern, and storage efficiency  相似文献   

11.
This paper describes a high-level library (The Nearest Neighbor Tool, NNT) that has been used to parallelize operational weather prediction models. NNT is part of the Scalable Modeling System (SMS), developed at the Forecast Systems Laboratory (FSL). Programs written in NNT rely on SMS's run-time system and port between a wide range of computing platforms, performing well in multiprocessor systems. We show, using examples from operational weather models, how large Fortran 77 codes can be parallelized using NNT. We compare the ease of programmability of NNT and High Performance Fortran (HPF). We also discuss optimizations like data movement overlap (in interprocessor communication and I/O operations), and the minimization of data exchanges through the use of redundant computations. We show that although HPF provides a simpler programming interface, NNT allows for program optimizations that increase performance considerably and still keeps a simple user interface. These optimizations have proven essential to run weather prediction models in real time, and HPF compilers should incorporate them in order to meet operational demands. Throughout the paper, we present performance results of weather models running on a network of workstations, the Intel Paragon, and the SGI Challenge. Finally, we study the cost of programming global address space architectures with NNT's local address space paradigm.  相似文献   

12.
This paper describes two new ideas by which a High Performance Fortran compiler can deal with irregular computations effectively. The first mechanism invokes a user specified mapping procedure via a set of proposed compiler directives. The directives allow use of program arrays to describe graph connectivity, spatial location of array elements, and computational load. The second mechanism is a conservative method for compiling irregular loops in which dependence arises only due to reduction operations. This mechanism in many cases enables a compiler to recognize that it is possible to reuse previously computed information from inspectors (e.g., communication schedules, loop iteration partitions, and information that associates off-processor data copies with on-processor buffer locations). This paper also presents performance results for these mechanisms from a Fortran 90D compiler implementation  相似文献   

13.
An integrated computer programme in Fortran IV for continuous or discrete non linear programming problems is presented. Several recent techniques and algorithms for non-linear programming have been adapted and new ideas have been introduced. They include the minimax and exterior-point approaches to non-linear programming, least pth optimization and the Dakin tree-search algorithm. The user may optionally choose the combination of techniques and algorithms best suited to his problems. Since many practical design problems can be easily formulated as non-linear programming problems, the programme, called DISOPT, enjoys a very wide range of applications such as continuous and discrete tolerance assignments, digital filter design, circuit design, system modelling and approximation problems. Numerical results for a number of functions and circuit tolerance optimization problems are presented in this paper to demonstrate the performance of DISOPT.  相似文献   

14.
Historically, the principal achievement of compiler technology has been to make it possible to program in a high-level, machine-independent style. The absence of compiler technology to provide such a style for parallel computers is the main reason these systems have not found widespread acceptance. This paper examines the prospects for machine-independent parallel programming, concentrating on Fortran D and High Performance Fortran, which support machine-independent expression of “data parallelism.”  相似文献   

15.
In this paper, we describe our experience with developing Airshed, a large pollution modeling application, in the Fx programming environment. We demonstrate that high level parallel programming languages like Fx and High Performance Fortran offer a simple and attractive model for developing portable and efficient parallel applications. Performance results are presented for the Airshed application executing on Intel Paragon and Cray T3D and T3E parallel computers. The results demonstrate that the application is “performance portable,” i.e., it achieves good and consistent performance across different architectures, and that the performance can be explained and predicted using a simple model for the communication and computation phases in the program. We also show how task parallelism was used to alleviate I/O related bottlenecks, an important consideration in many applications. Finally, we demonstrate how external parallel modules developed using different parallelization methods can be integrated in a relatively simple and flexible way with modules developed in the Fx compiler framework. Overall, our experience demonstrates that a high level parallel programming environment based on a language like HPF is suitable for developing complex multidisciplinary applications.  相似文献   

16.
Array statements are often used to express data-parallelism in scientific languages such as Fortran 90 and High Performance Fortran. In compiling array statements for a distributed-memory machine, efficient generation of communication sets and local index sets is important. We show that for arrays distributed block-cyclically on multiple processors, the local memory access sequence and communication sets can be efficiently enumerated as closed forms using regular sections. First, closed form solutions are presented for arrays that are distributed using block or cyclic distributions. These closed forms are then used with avirtual processor approachto give an efficient solution for arrays with block-cyclic distributions. This approach is based on viewing a block-cyclic distribution as a block (or cyclic) distribution on a set of virtual processors, which are cyclically (or block-wise) mapped to physical processors. These views are referred to asvirtual-blockorvirtual-cyclicviews, depending on whether a block or cyclic distribution of the array on the virtual processors is used. The virtual processor approach permits different schemes based on the combination of the virtual processor views chosen for the different arrays involved in an array statement. These virtualization schemes have different indexing overhead. We present a strategy for identifying the virtualization scheme which will have the best performance. Performance results on a Cray T3D system are presented for hand-compiled code for array assignments. These results show that using the virtual processor approach, efficient code can be generated for execution of array statements involving block-cyclically distributed arrays.  相似文献   

17.
Wang  Hui  Guo  Minyi  Wei  Daming 《The Journal of supercomputing》2004,29(2):157-170
In order to achieve higher load balancing, it is necessary to solve irregular block redistribution problems, which are different from regular block-cyclic redistribution. High Performance Fortran version 2 (HPF-2) provides irregular distribution functionalities, such as GEN_BLOCK and INDIRECT. This paper is devoted to develop an efficient algorithm that attempts to obtain near optimal scheduling while satisfying the conditions of minimal message size of total steps and the minimal number of steps for irregular array redistribution. The algorithm intends to decrease the computation costs by dividing the whole block into sub-blocks and solving the sub-problems accordingly, and then merging them together to get final results. Simulation results show that our algorithm has comparable performance with a relocation algorithm developed previously (H. Yook and M. Park. Proceedings of the IASTED International Conference Parallel and Distributed Computingand Systems, Nov. 3–6, MIT, Boston, USA, 1999).  相似文献   

18.
How can a program written in pure applicative LISP be reused in a Fortran environment? One answer is by automatically transforming it from LISP into Fortran. In this paper we discuss a practical application of this technique-one that yields an efficient Fortran program. We view this process as an example of abstract programming, in which the LISP program constitutes an abstract specification for the Fortran version. The idea of strategy-a strategy for getting from LISP to Fortran-is basic to designing and applying the transformations. One strategic insight is that the task is easier if the LISP program is converted to ``recursive' Fortran, and then the recursive Fortran program is converted to nonrecursive standard Fortran. Another strategic insight is that much of the task can be accomplished by converting the program from one canonical form to another. Developing a strategy also involves making various implementation decisions. One advantage of program transformation methodology is that it exposes such decisions for examination and review. Another is that it enables optimizations to be detected and implemented easily. Once a strategy has been discovered, it can be implemented by means of rewrite-rule transformations using the TAMPR program transformation system. The transformational approach to program reuse based on this strategy has a measure of elegance. It is also practical-the resulting Fortran program is 25 percent faster than its compiled LISP counterpart, even without extensive optimization.  相似文献   

19.
Pure data-parallel languages such as High Performance Fortran version 1 (HPF) do not allow efficient expression of mixed task/data-parallel computations or the coupling of separately compiled data-parallel modules. In this paper, we show how these common parallel program structures can be represented, with only minor extensions to the HPF model, by using a coordination library based on the Message Passing Interface (MPI). This library allows data-parallel tasks to exchange distributed data structures using calls to simple communication functions. We present microbenchmark results that characterize the performance of this library and that quantify the impact of optimizations that allow reuse of communication schedules in common situations. In addition, results from two-dimensional FFT, convolution, and multiblock programs demonstrate that the HPF/MPI library can provide performance superior to that of pure HPF. We conclude that this synergistic combination of two parallel programming standards represents a useful approach to task parallelism in a data-parallel framework, increasing the range of problems addressable in HPF without requiring complex compiler technology.  相似文献   

20.
This paper introduces a data distribution scheme and an alignment algorithm for parallel volume rendering. The algorithm performs a single wrap-around shear transformation which requires only a regular inter-processor communication pattern. The alignment can be implemented incrementally consisting of short distance shifts, thus significantly reducing the communication overhead. The alignment process is a non-destructive transformation, consisting of a single non-scaling shear operation. This is a unique feature which provides the basis for the incremental algorithm.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号