首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
分布式存储器并行计算机的可扩展性使它们成为解决大型问题的有力的候选机种。象HPF,FortranD和VPP Fortran这些新型语言使现有软件能方便地移植这些机器上,现在已制成许多分布式存储器并行计算机,但其中没有一台支持这些语言所需的机制,我们开发了一台型分布式存储器并行计算机AP1000+,即AP1000的增强型。通过使用诸如NAS并行基准程序等VPP Fortran 和C科学应用,我们模拟  相似文献   

2.
调用图是过程间分析和程度自动并行化的基础。生成精确调用图可以进一步开发程序的并行性。此文针对Fortran程序,提出了一项完全消除哑过程,产生精确调用图的技术与相应的算法。该算法已在面向MPP Fortran的程序自动并行化工具中实现。  相似文献   

3.
为CM-2与CM-5开发的当前连接机(CM)Fortran代码代表了一类重要的并行应用程序。过去5-6牛,一些用户在CM-2与CM-5的生产模式中已使用了CMFortran代码,谅与成本而言这是一次重大的投资。当思维机公司决定撤销件业务和使CM-2与CM-5机器退役时,保护CMFortran代码实质性投资的最好办法是将Fortran代码移植到高度并行系统上的高性能Fortaran(HPF)。HPF  相似文献   

4.
本文分析了在3L并行Fortran编译环境下,以及MS-Fortran和NDP-Fortran串行编译环境下,数据文件按二进制数代码以顺序方式存取时的存储格式;分别提出并实现了采用串并行Fortran设计数据文件接口的方法,并给出了适当的例子。  相似文献   

5.
基于机群系统的C++语言并行化实现   总被引:2,自引:0,他引:2  
在计算机机群系统环境下,将面向对象程序设计技术与并行技术相结合能够有效地降低并行程序设计的难度,提高并行程序的可维护性、可移植性和可重用性.本文探讨了机群系统下的C++语言并行化实现的几种方法,分别介绍了基于消息传递的MPC++、基于共享对象的SOC++和基于对象级并行的CCPP语言模型、编程接口及其实现,并给出了几种语言系统评测的结果及分析.  相似文献   

6.
VPP500向量并行处理机是一台高度并行的分布式存储器巨型计算机,性能范围是6.4 ̄355GFLOPS,主存容量为1 ̄222GB。该系统可支持4 ̄222个由高带宽交叉开关网络互连的处理器。VPP500与当前大规模并行系统截然不同的三个关键特征决定了其体系结构。第一,它的组成部件是1.6GFLOPS的向量处理器,比大规模并行处理机(MPP)中使用的处理器快一个数量级。这种极高的单处理器性能降低了系统  相似文献   

7.
MCIM——存储器为中心的互联机制的并行系统结构   总被引:4,自引:2,他引:2  
并行系统中各结点间的互联网络是高性能计算机的一个关键研究领域。30多年来传统上各种互联网络是以逻辑电路为基础所构成的〔1〕。然而,系统结构及其概念应随计算机工艺的进展而变化。本文提出一种新型的并行系统结构,它采用多端口快速静态存储器作为各结点机之间的互联机制MCIM。与传统的逻辑电路互联网络相比,MCIM可以减少MPP系统中的消息传递延迟;可以克服网络并行计算系统NPC中网络适配卡总线传递速率的瓶颈;可以大量减少网络协议开销。在当前VLSI工艺条件下,MCIM的实现是低价高效的。本文讨论了MCIM并行系统结构,通信路径的仲裁与选择。本文还阐述了MCIM仿真工具,给出了实验结果  相似文献   

8.
程序自动并行化工具FAK   总被引:1,自引:0,他引:1  
该文介绍了大规模并行处理系统程序自动并行化工具FAX(Fortran Automated Xlator)的系统概况。重点阐述了FAX中所采用的先进技术。测试结果表明,FAX已具备一定的可用性及有效性,作为面向分布主存并行机系统的程序自动并行化工具,基本达到了设计目标。  相似文献   

9.
HPF(HighPerformanceFortran)是基于数据划分说明的并行语言。如何由数据划分确定程序的计算划分是HPF编译器需要首先解决的基本问题。本文介绍了HPF的数据划分和计算划分的概念。以三层嵌套循环为例,直观地提出了一种求得计算划分的算法  相似文献   

10.
Fortran 90语言中辅程序交叉返回的编译技术   总被引:1,自引:0,他引:1  
Fortran90是Fortran语言的最新国际标准。出于对语言兼容性方面的考虑,Fortran90语言中的辅程序可以有多个出口。该编译系统的目标语言是程序设计语言C。根据C语言的语义,在一个函数中只能有一个出口。源语言与目标语言之间存在的语义差距,给编译程序的实现增加了难度。文中着重介绍了Fortran90语言中辅程序交叉返回的特征及其编译实现的技术。  相似文献   

11.
面向MPP Fortran 的自动数据分布   总被引:2,自引:0,他引:2  
唐新春  郭克榕 《软件学报》1998,9(2):144-150
自动数据分布是面向大规模并行处理MPP(massively parallel processing)系统程序自动并行化的一项关键技术.数据分布方式直接影响着应用程序在MPP系统上的并行执行性能.本文以MPP Fortran为例,详细探讨了自动数据分布的有关技术,如对准分析、分布方式的产生、静态性能评估和数据重新分布等,并提出了相应的算法.这些算法将在作者研制的面向MPP Fortran 的程序自动并行化工具中实现.  相似文献   

12.
该文介绍了大规模并行处理系统程序自动并行化工具FAX(FortranAutomatedXlator)的系统概况。重点阐述了FAX中所采用的先进技术。测试结果表明,FAX已具备一定的可用性及有效性,作为面向分布主存并行机系统的程序自动并行化工具,基本达到了设计目标。  相似文献   

13.
In this paper we discuss the implementation of an ADI method for solving the diffusion equation on three parallel/vector computers. The computers were chosen so as to encompass a variety of architectures. They are the MPP, an SIMD machine with 16-Kbit serial processors; Flex/32, an MIMD machine with 20 processors; and Cray/2, an MIMD machine with four vector processors. The Gaussian elimination algorithm is used to solve a set of tridiagonal systems on the Flex/32 and Cray/2 while the cyclic elimination algorithm is used to solve these systems on the MPP. The implementation of the method is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally conclusions are presented.  相似文献   

14.
In this paper, we describe the results of several tests that check the accuracy of numerical computation on the Cray supercomputer in vector and scalar modes. The known tests were modified to identify the critical point where roundings start causing problems. After describing the tests, we present an interval library called libavi.a. It was developed in Fortran 90 on the Cray Y-MP2E supercomputer of UFRGS-Brazil. This library makes interval mathematics accessible to the Cray supercomputers users. It works with real and complex intervals and intervals matrices and vectors. The library allows overloading of operators and functions. It is organized in four modules: real intervals, interval vectors and matrices, complex intervals, and linear algebra applications.  相似文献   

15.
Parallel programming with Polaris   总被引:1,自引:0,他引:1  
Parallel programming tools are limited, making effective parallel programming difficult and cumbersome. Compilers that translate conventional sequential programs into parallel form would liberate programmers from the complexities of explicit, machine oriented parallel programming. The paper discusses parallel programming with Polaris, an experimental translator of conventional Fortran programs that target machines such as the Cray T3D  相似文献   

16.
An algorithm for making sequential programs parallel is described, which first identifies all subroutines, then determines the appropriate execution mode and restructures the code. It works recursively to parallelize the entire program. We use Fortran in our work, but many of the concepts apply to other languages. Our hardware model is a shared-memory multiprocessor system with a fixed number of identical processors, each with its own local memory connected to a common memory that is accessible to all processors equally. The model implements interprocessor synchronization and communication via special memory locations or special storage. Systems like the Cray X-MP, IBM 3090, and Alliant FX/8 fit this model. Our input is a sequential, structured Fortran program with no overlapping branches. With today's emphasis on writing structured code, this restriction is reasonable. A prototype of a system to implement the algorithm is under development on an IBM 3090 multiprocessor  相似文献   

17.
MPPF ORTRAN程序中外部过程调用的多版本技术   总被引:3,自引:0,他引:3  
在设计Moportran并行程序时,若外部过程调用的版本选择不当或哑,实参数的共享,私有类型不匹配,则可能导致程序出错或性能下降。本文提出了一种有效的解决方法;外部过程调用的多版本技术。  相似文献   

18.
The message-passing interface (MPI) has become the standard in achieving effective results when using the message passing paradigm of parallelization. Codes written using MPI are extremely portable and are applicable to both clusters and massively parallel computing platforms. Since MPI uses the single program, multiple data (SPMD) approach to parallelism, good performance requires careful tuning of the serial code as well as careful data and control flow analysis to limit communication. We discuss optimization strategies used and their degree of success to increase performance of an MPI-based unstructured finite element simulation code written in Fortran 90. We discuss performance results based on implementations using several modern massively parallel computing platforms including the SGI Origin 3800, IBM Nighthawk 2 SMP, and Cray T3E-1200.  相似文献   

19.
Most implementations of a radix-2 fast Fourier transform on large scientific computers use algorithms that involve memory accesses whose strides are powers of two. (The term stride means the memory increment between successive elements stored or fetched.) Such strides are unacceptable for recently developed supercomputers, particularly the Cray-2, because of serious difficulties with memory bank conflicts.This article describes an algorithm for evaluating the fast Fourier transform that avoids this difficulty and thus could provide the basis for implementations that more fully utilize the power of the Cray-2. A Fortran program implementing this algorithm is included, and timing comparisons with the Cray assembly-coded library subroutine are shown.The author is with the Numerical Aerodynamic Simulation Systems Division at NASA Ames Research Center.  相似文献   

20.
This paper presents SUPPLE (SUPort for Parallel Loop Execution), an innovative run-time support for the execution of parallel loops with regular stencil data references and non-uniform iteration costs. SUPPLE relies upon a static block data distribution to exploit locality, and combines static and dynamic policies for scheduling non-uniform iterations. It adopts, as far as possible, a static scheduling policy derived from the owner computes rule, and moves data and iterations among processors only if a load imbalance actually occurs. SUPPLE always tries to overlap communications with useful computations by reordering loop iterations and prefetching remote ones in the case of workload imbalance. The SUPPLE approach has been validated by many experimental results obtained by running a multi-dimensional flame simulation kernel on a 64-node Cray T3D. We have fed the benchmark code with several synthetic input data sets built on the basis of a load imbalance model. We have compared our results with those obtained with a CRAFT Fortran implementation of the benchmark.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号