期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

《Advances in Engineering Software》2001,32(8):665-671

The extended full-potential (FPX) helicopter rotor computational fluid dynamics (CFD) code of Fortran in its reduced two-dimensional version is successfully converted into a parallel version for multiprocessing. The FPX code with an internal grid generator solves the compressible full-potential equation using an approximately factored finite-difference scheme with added numerous physical modeling enhancements, including viscous boundary layers, shock-induced entropy corrections and wake-vortex embedding. The parallel version of the code uses open multi-processing (OpenMP) directives as parallel programming tool in shared-memory (SM) environment. The OpenMP code is portable and scalable, which can run on various computer platforms including UNIX platforms and Windows NT platforms. The performance study of the parallel code on SGI Origin 2000 UNIX platform is made. The results show that reasonable speedups through parallelization are obtained and that OpenMP is easy to use and an efficient parallel programming tool for the present problem. 相似文献

2.

On the use of diagnostic dependence-analysis tools in parallel programming: Experiences using PTOOL

Leslie A. Henderson Robert E. Hiromoto Olaf M. Lubeck Margaret L. Simmons 《The Journal of supercomputing》1990,4(1):83-96

Although considerable technology has been developed for debugging and developing sequential programs, producing verifiably correct parallel code is a much harder task. In view of the large number of possible scheduling sequences, exhaustive testing is not a feasible method for determining whether a given parallel program is correct; nor have there been sufficient theoretical developments to allow the automatic verification of parallel programs. PTOOL, a tool being developed at Rice University in collaboration with users at Los Alamos National Laboratory, provides an alternative mechanism for producing correct parallel code. PTOOL is a semi-automatic tool for detecting implicit parallelism in sequential Fortran code. It uses vectorizing compiler techniques to identify dependences preventing the parallelization of sequential regions. According to the model supported by PTOOL, a programmer should first implement and test his program using traditional sequential debugging techniques. Then, using PTOOL, he can select loop bodies that can be safely executed in parallel. At Los Alamos, we have been interested in examining the role of dependence-analysis tools in the parallel programming process. Therefore, we have used PTOOL as a static debugging tool to analyze parallel Fortran programs. Our experiences using PTOOL lead us to conclude that dependence-analysis tools are useful to today's parallel programmers. Dependence-analysis is particularly useful in the development of asynchronous parallel code. With a tool like PTOOL, a programmer can guarantee that processor scheduling cannot affect the results of his parallel program. If a programmer wishes to implement a partially parallelized region through the use of synchronization primitives, however, he will find that dependence analysis is less useful. While a dependence-analysis tool can greatly simplify the task of writing synchronization code, the ultimate responsibility of correctness is left to the programmer.This work was performed under the auspices of the U.S. Department of Energy. 相似文献

3.

A Comparison of Co-Array Fortran and OpenMP Fortran for SPMD Programming

Alan J. Wallcraft 《The Journal of supercomputing》2002,22(3):231-250

Co-Array Fortran, formally called F^––, is a small set of extensions to Fortran 90/95 for Single-Program-Multiple-Data (SPMD) parallel processing. OpenMP Fortran is a set of compiler directives that provide a high level interface to threads in Fortran, with both thread-local and thread-shared memory. OpenMP is primarily designed for loop-level directive-based parallelization, but it can also be used for SPMD programs by spawning multiple threads as soon as the program starts and having each thread then execute the same code independently for the duration of the run. The similarities and differences between these two SPMD programming models are described.Co-Array Fortran can be implemented using either threads or processes, and is therefore applicable to a wider range of machine types than OpenMP Fortran. It has also been designed from the ground up to support the SPMD programming style. To simplify the implementation of Co-Array Fortran, a formal Subset is introduced that allows the mapping of co-arrays onto standard Fortran arrays of higher rank. An OpenMP Fortran compiler can be extended to support Subset Co-Array Fortran with relatively little effort. 相似文献

4.

An easily implemented task-based parallel scheme for the Fourier pseudospectral solver applied to 2D Navier-Stokes turbulence

Z. Yin H.J.H. Clercx 《Computers & Fluids》2004,33(4):509-520

An efficient parallel scheme is proposed for performing direct numerical simulation (DNS) of two-dimensional Navier-Stokes turbulence at high Reynolds numbers. We illustrate the resulting numerical code by displaying relaxation to states close to those that have been predicted by statistical-mechanical methods which start from ideal (Euler) fluid mechanics. The validation of these predictions by DNS requires unusually long computation times on single-cpu workstations, and suggests the use of parallel computation. The performance of our MPI Fortran 90 code on the SGI Origin 3800 is reported, together with its comparison with another parallel method. A few computational results that illustrate tests of the statistical-mechanical predictions are presented. 相似文献

5.

Porting a global ocean model onto a shared-memory multiprocessor: Observations and guidelines

Richard J. Procassini Scott R. Whitman William P. Dannevik 《The Journal of supercomputing》1993,7(3):287-321

A three-dimensional global ocean circulation model has been modified to run on the BBN TC2000 multiple instruction stream/multiple data stream (MIMD) parallel computer. Two shared-memory parallel programming models have been used to implement the global ocean model on the TC2000: the TCF (TC2000 Fortran) fork-join model and the PFP (Parallel Fortran Preprocessor) split-join model. The method chosen for the parallelization of this global ocean model on a shared-memory MIMD machine is discussed. The performance of each version of the code has been measured by varying the processor count for a fixed-resolution test case. The statically scheduled PFP version of the code achieves a higher parallel computing efficiency than does the dynamically scheduled TCF version of the code. The observed differences in the performance of the TCF and PFP versions of the code are discussed. The parallel computing performance of the shared-memory implementation of the global ocean model is limited by several factors, most notably load imbalance and network contention. The experience gained while porting this large, real world application onto a shared-memory multiprocessor is also presented to provide insight to the reader who may be contemplating such an undertaking. 相似文献

6.

Nested Parallelization with OpenMP

Dieter an Mey Samuel Sarholz Christian Terboven 《International journal of parallel programming》2007,35(5):459-476

OpenMP is widely accepted as a de facto standard for shared memory parallel programming in Fortran, C and C++. Nested parallelization has been included in the first OpenMP specification, but it took a few years until the first commercially available compilers supported this optional part of the specification. We employed nested parallelization using OpenMP in three production codes: a C++ code for content-based image retrieval, a C++ code for the computation of critical points in multi-block CFD datasets, and a multi-block Navier-Stokes solver written in Fortran90. In this paper we discuss the opportunities as well as the deficiencies of the nested parallelization support in OpenMP. 相似文献

7.

Scalable and portable implementation of the fast multipole method on parallel computers

Shuji Ogata Rajiv K Kalia Aiichiro Nakano Priya Vashishta Satyavani Vemparala 《Computer Physics Communications》2003,153(3):445-461

A scalable and portable Fortran code is developed to calculate Coulomb interaction potentials of charged particles on parallel computers, based on the fast multipole method. The code has a unique feature to calculate microscopic stress tensors due to the Coulomb interactions, which is useful in constant-pressure simulations and local stress analyses. The code is applicable to various boundary conditions, including periodic boundary conditions in two and three dimensions, corresponding to slab and bulk systems, respectively. Numerical accuracy of the code is tested through comparison of its results with those obtained by the Ewald summation method and by direct calculations. Scalability tests show the parallel efficiency of 0.98 for 512 million charged particles on 512 IBM SP3 processors. The timing results on IBM SP3 are also compared with those on IBM SP4. 相似文献

8.

Start/Pat: a parallel-programming toolkit

Appelbe B. Smith K. McDowell C. 《Software, IEEE》1989,6(4):29-38

The authors address the question of how to use existing sequential Fortran code on multiprocessors. Their answer is Start/Pat, an interactive toolkit that automates the parallelization of sequential Fortran as it teaches the programmer how to exploit and understand parallel structures and architectures. The Start/Pat prototype has been installed at several user sites. The authors discuss the choice of PCF Fortran, the toolkit components, and the features of Pat and Start 相似文献

9.

Vienna-Fortran/HPF extensions for sparse and irregular problems andtheir compilation

Ujaldon M. Zapata E.L. Chapman B.M. Zima H.P. 《Parallel and Distributed Systems, IEEE Transactions on》1997,8(10):1068-1083

Vienna Fortran, High Performance Fortran (HPF), and other data parallel languages have been introduced to allow the programming of massively parallel distributed-memory machines (DMMP) at a relatively high level of abstraction, based on the SPMD paradigm. Their main features include directives to express the distribution of data and computations across the processors of a machine. In this paper, we use Vienna-Fortran as a general framework for dealing with sparse data structures. We describe new methods for the representation and distribution of such data on DMMPs, and propose simple language features that permit the user to characterize a matrix as “sparse” and specify the associated representation. Together with the data distribution for the matrix, this enables the complier and runtime system to translate sequential sparse code into explicitly parallel message-passing code. We develop new compilation and runtime techniques, which focus on achieving storage economy and reducing communication overhead in the target program. The overall result is a powerful mechanism for dealing efficiently with sparse matrices in data parallel languages and their compilers for DMMPs 相似文献

10.

曙光机群系统并行调试器的设计与实现

陈勇何克东陆在朝鄢超《计算机工程》2004,30(9):50-52

介绍了为曙光机群系统设计实现的并行调试器DCDB。DCDB同时支持调试MPI或PVM、C或Fortran的并行应用程序，实现了记录／重放并行调试功能，支持循环调试，解决了并行调试时并行程序的不确定性问题。DCDB采用Client／server／Client结构，具有友好的图形用户界面，系统主要采用Java语言开发，具有良好的可移植性和可扩展性。相似文献

11.

Automatic extraction of functional parallelism from ordinaryprograms

Girkar M. Polychronopoulos C.D. 《Parallel and Distributed Systems, IEEE Transactions on》1992,3(2):166-178

Presents the hierarchical task graph (HTG) as an intermediate parallel program representation which encapsulates minimal data and control dependences, and which can be used for the extraction and exploitation of functional, or task-level parallelism. The hierarchical nature of the HTG facilitates efficient task-granularity control during code generation, and thus applicability to a variety of parallel architectures. The construction of the HTG at a given hierarchy level, the derivation of the execution conditions of tasks which maximizes task-level parallelism, and the optimization of these conditions which results in reducing synchronization overhead imposed by data and control dependences are emphasized. Algorithms for the formation of tasks and their execution conditions based on data and control dependence constraints are presented. The issue of optimization of such conditions is discussed, and optimization algorithms are proposed. The HTG is used as the intermediate representation of parallel Fortran and C programs for generating parallel source as well as parallel machine code 相似文献

12.

A Mathematica interface for FormCalc-generated code

T. Hahn 《Computer Physics Communications》2008,178(3):217-221

This note describes a Mathematica interface for Fortran code generated by FormCalc. The interfacing code is set up automatically so that only minuscule changes in the driver files are required. The interface makes a function to compute the cross-section or decay rate available in Mathematica. This function depends on the model parameters chosen for interfacing in the Fortran code. 相似文献

13.

HOTB: High precision parallel code for calculation of four-particle harmonic oscillator transformation brackets

A. Stepšys S. Mickevicius D. Germanas R.K. Kalinauskas 《Computer Physics Communications》2014

This new version of the HOTB program for calculation of the three and four particle harmonic oscillator transformation brackets provides some enhancements and corrections to the earlier version (Germanas et al., 2010) [1]. In particular, new version allows calculations of harmonic oscillator transformation brackets be performed in parallel using MPI parallel communication standard. Moreover, higher precision of intermediate calculations using GNU Quadruple Precision and arbitrary precision library FMLib [2] is done. A package of Fortran code is presented. Calculation time of large matrices can be significantly reduced using effective parallel code. Use of Higher Precision methods in intermediate calculations increases the stability of algorithms and extends the validity of used algorithms for larger input values. 相似文献

14.

Optimization and Performance of a Fortran 90 MPI-Based Unstructured Code on Large-Scale Parallel Systems

Shires Dale Mohan Ram 《The Journal of supercomputing》2003,25(2):131-141

The message-passing interface (MPI) has become the standard in achieving effective results when using the message passing paradigm of parallelization. Codes written using MPI are extremely portable and are applicable to both clusters and massively parallel computing platforms. Since MPI uses the single program, multiple data (SPMD) approach to parallelism, good performance requires careful tuning of the serial code as well as careful data and control flow analysis to limit communication. We discuss optimization strategies used and their degree of success to increase performance of an MPI-based unstructured finite element simulation code written in Fortran 90. We discuss performance results based on implementations using several modern massively parallel computing platforms including the SGI Origin 3800, IBM Nighthawk 2 SMP, and Cray T3E-1200. 相似文献

15.

A comparative study of Java and C performance in two large‐scale parallel applications

Aamir Shafi Bryan Carpenter Mark Baker Aftab Hussain 《Concurrency and Computation》2009,21(15):1882-1906

In the 1990s the Message Passing Interface Forum defined MPI bindings for Fortran, C, and C++. With the success of MPI these relatively conservative languages have continued to dominate in the parallel computing community. There are compelling arguments in favour of more modern languages like Java. These include portability, better runtime error checking, modularity, and multi‐threading. But these arguments have not converted many HPC programmers, perhaps due to the scarcity of full‐scale scientific Java codes, and the lack of evidence for performance competitive with C or Fortran. This paper tries to redress this situation by porting two scientific applications to Java. Both of these applications are parallelized using our thread‐safe Java messaging system—MPJ Express. The first application is the Gadget‐2 code, which is a massively parallel structure formation code for cosmological simulations. The second application uses the finite‐domain time‐difference method for simulations in the area of computational electromagnetics. We evaluate and compare the performance of the Java and C versions of these two scientific applications, and demonstrate that the Java codes can achieve performance comparable with legacy applications written in conventional HPC languages. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献

16.

基于txt文件快速实现C#与Fortran混合编程

郑晖《计算机与数字工程》2010,38(10):146-149

将Fortran控制台应用程序制作成Fortran动态链接库文件,使用C#语言开发软件界面并调用Fortran DLL实现混合编程。给出了C#语言和Fortran语言混合编程时需要注意的要点。结合示例介绍了一种利用txt文件快速实现C#和Fortran混合编程的方法。该方法只需要对Fortran源代码做较小的修改就能有效地实现两种语言的混合编程,此方法对其他不同语言之间的混合编程提供了一种参考思路。相似文献

17.

MM90: A scalable parallel implementation of the Penn State/NCAR Mesoscale Model (MM5)

J. Michalakes 《Parallel Computing》1997,23(14):7091-2186

This paper describes MM90, a parallel regional weather model based on the Penn State/NCAR MM5. Parallelization of finite differencing, horizontal interpolation, and nesting on distributed-memory (message passing) computers is handled transparently using the RSL library package. Fortran90 modules, derived data types, dynamic memory allocation, pointers, and recursion are used, making the code modular, flexible, extensible, and run-time configurable. The model can dynamically sense and correct load imbalances. The paper provides performance, scaling, and load-balancing data collected on the IBM SP2 computers at Argonne National Laboratory and NASA Ames Laboratory. Future work will address the impact of parallel modifications on existing modeling software; an approach using commercially available source translation software is described. 相似文献

18.

Parallel program analysis and restructuring by detection of point-to-point interaction patterns and their transformation into collective communication constructs 总被引：1，自引：0，他引：1

Beniamino Di Martino Antonino Mazzeo Nicola Mazzocca Umberto Villano 《Science of Computer Programming》2001,40(2-3):235-263

This paper deals with a technique that can support the re-engineering of parallel programs based on point-to-point communication primitives by detecting typical process interaction patterns in the code. Pattern detection is performed by the static analysis of the parallel program and by solving Diophantine sets of inequalities. The objective is to determine process interactions and to classify them into a set of commonly occurring interaction patterns.

Information on the patterns contained in the program, besides being useful for code comprehension and documentation, makes it possible to obtain more structured and, possibly, efficient versions of the same programs through the use of collective communication constructs. These are primitives for collective data movement or computation often available in current message-passing programming environments.

After the presentation of the basic program analysis technique, several examples involving the detection of common communication patterns are shown. Then the structure of PPAR, a prototype tool that allows the analysis of parallel programs written in Fortran 77 with calls to PVM or MPI unstructured communication primitives is outlined, and conclusions are drawn. 相似文献

19.

Parallelizing subroutines in sequential programs

Chih-Ping Chu Carver D.L. 《Software, IEEE》1994,11(1):77-85

An algorithm for making sequential programs parallel is described, which first identifies all subroutines, then determines the appropriate execution mode and restructures the code. It works recursively to parallelize the entire program. We use Fortran in our work, but many of the concepts apply to other languages. Our hardware model is a shared-memory multiprocessor system with a fixed number of identical processors, each with its own local memory connected to a common memory that is accessible to all processors equally. The model implements interprocessor synchronization and communication via special memory locations or special storage. Systems like the Cray X-MP, IBM 3090, and Alliant FX/8 fit this model. Our input is a sequential, structured Fortran program with no overlapping branches. With today's emphasis on writing structured code, this restriction is reasonable. A prototype of a system to implement the algorithm is under development on an IBM 3090 multiprocessor 相似文献

20.

Optimizations of a GPU accelerated heat conduction equation by a programming of CUDA Fortran from an analysis of a PTX file

Shin-ichi Satake Hajime Yoshimori Takayuki Suzuki 《Computer Physics Communications》2012,183(11):2376-2385

The Fortran language has been commonly used for many kinds of scientific computation. In this paper, we focus on the solution of an unsteady heat conduction equation, which is one of the simplest problems for thermal dynamics. Recently, a GPU (graphics processing unit) has been enhanced with a Fortran programming language capability employing CUDA (compute unified device architecture), known as CUDA Fortran. We find that the speed performance of a system using an ordinary program coding of CUDA Fortran is lower than that of systems using a program coding of CUDA C. We also find that intermediate assembly files PTX (parallel thread execution) of the two languages are not coincident. Therefore, by comparing the PTX files from the two coding programs we could detect the bottleneck that causes the speed reduction. We propose three optimization techniques that can enable the calculated speeds using CUDA Fortran and CUDA C to be coincident. The optimizations can be performed by the Fortran language when improved by an analyzed PTX file. It is thus possible to improve the performance of CUDA Fortran by adding a correction to it, which happens to be at a programming language level. 相似文献