期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

沈斌赵秉中《电子计算机》1995,(6):23-40

分布式存储器并行计算机的可扩展性使它们成为解决大型问题的有力的候选机种。象ＨＰＦ，ＦｏｒｔｒａｎＤ和ＶＰＰＦｏｒｔｒａｎ这些新型语言使现有软件能方便地移植这些机器上，现在已制成许多分布式存储器并行计算机，但其中没有一台支持这些语言所需的机制，我们开发了一台型分布式存储器并行计算机ＡＰ１０００＋，即ＡＰ１０００的增强型。通过使用诸如ＮＡＳ并行基准程序等ＶＰＰＦｏｒｔｒａｎ和Ｃ科学应用，我们模拟相似文献

2.

一种精确的调用图生成技术

唐新春郭春榕《计算机工程与设计》1997,18(5):34-39

调用图是过程间分析和程度自动并行化的基础。生成精确调用图可以进一步开发程序的并行性。此文针对Ｆｏｒｔｒａｎ程序，提出了一项完全消除哑过程，产生精确调用图的技术与相应的算法。该算法已在面向ＭＰＰＦｏｒｔｒａｎ的程序自动并行化工具中实现。相似文献

3.

在IBMSP2和SGIPower Challenge上CMFortran代码移植至HPF的NAS经验

龚铿凌晨《电子计算机》1997,(1):30-37

为ＣＭ－２与ＣＭ－５开发的当前连接机（ＣＭ）Ｆｏｒｔｒａｎ代码代表了一类重要的并行应用程序。过去５－６牛，一些用户在ＣＭ－２与ＣＭ－５的生产模式中已使用了ＣＭＦｏｒｔｒａｎ代码，谅与成本而言这是一次重大的投资。当思维机公司决定撤销件业务和使ＣＭ－２与ＣＭ－５机器退役时，保护ＣＭＦｏｒｔｒａｎ代码实质性投资的最好办法是将Ｆｏｒｔｒａｎ代码移植到高度并行系统上的高性能Ｆｏｒｔａｒａｎ（ＨＰＦ）。ＨＰＦ相似文献

4.

串行与并行FORTRAN数据文件接口

李明瑞程建钢《小型微型计算机系统》1995,16(6):33-38

本文分析了在３Ｌ并行Ｆｏｒｔｒａｎ编译环境下，以及ＭＳ－Ｆｏｒｔｒａｎ和ＮＤＰ－Ｆｏｒｔｒａｎ串行编译环境下，数据文件按二进制数代码以顺序方式存取时的存储格式；分别提出并实现了采用串并行Ｆｏｒｔｒａｎ设计数据文件接口的方法，并给出了适当的例子。相似文献

5.

基于机群系统的C＋＋语言并行化实现 总被引：2，自引：0，他引：2

温冬蝉王鼎兴《计算机学报》1997,(1)

在计算机机群系统环境下，将面向对象程序设计技术与并行技术相结合能够有效地降低并行程序设计的难度，提高并行程序的可维护性、可移植性和可重用性．本文探讨了机群系统下的Ｃ＋＋语言并行化实现的几种方法，分别介绍了基于消息传递的ＭＰＣ＋＋、基于共享对象的ＳＯＣ＋＋和基于对象级并行的ＣＣＰＰ语言模型、编程接口及其实现，并给出了几种语言系统评测的结果及分析．相似文献

6.

VPP500并行巨型机的体系结构

王广益《电子计算机》1996,(6):42-50

ＶＰＰ５００向量并行处理机是一台高度并行的分布式存储器巨型计算机，性能范围是６．４￣３５５ＧＦＬＯＰＳ，主存容量为１￣２２２ＧＢ。该系统可支持４￣２２２个由高带宽交叉开关网络互连的处理器。ＶＰＰ５００与当前大规模并行系统截然不同的三个关键特征决定了其体系结构。第一，它的组成部件是１．６ＧＦＬＯＰＳ的向量处理器，比大规模并行处理机（ＭＰＰ）中使用的处理器快一个数量级。这种极高的单处理器性能降低了系统相似文献

7.

MCIM——存储器为中心的互联机制的并行系统结构 总被引：4，自引：2，他引：2

李三立武剑锋《小型微型计算机系统》1997,18(2):1-9

并行系统中各结点间的互联网络是高性能计算机的一个关键研究领域。３０多年来传统上各种互联网络是以逻辑电路为基础所构成的〔１〕。然而，系统结构及其概念应随计算机工艺的进展而变化。本文提出一种新型的并行系统结构，它采用多端口快速静态存储器作为各结点机之间的互联机制ＭＣＩＭ。与传统的逻辑电路互联网络相比，ＭＣＩＭ可以减少ＭＰＰ系统中的消息传递延迟；可以克服网络并行计算系统ＮＰＣ中网络适配卡总线传递速率的瓶颈；可以大量减少网络协议开销。在当前ＶＬＳＩ工艺条件下，ＭＣＩＭ的实现是低价高效的。本文讨论了ＭＣＩＭ并行系统结构，通信路径的仲裁与选择。本文还阐述了ＭＣＩＭ仿真工具，给出了实验结果相似文献

8.

程序自动并行化工具FAK 总被引：1，自引：0，他引：1

郭克榕唐新春《计算机工程与应用》1999,35(9):36-38,43

该文介绍了大规模并行处理系统程序自动并行化工具ＦＡＸ（ＦｏｒｔｒａｎＡｕｔｏｍａｔｅｄＸｌａｔｏｒ）的系统概况。重点阐述了ＦＡＸ中所采用的先进技术。测试结果表明,ＦＡＸ已具备一定的可用性及有效性,作为面向分布主存并行机系统的程序自动并行化工具,基本达到了设计目标。相似文献

9.

HPF计算划分的算法实现

下载免费PDF全文

仲跻冬李晓明《计算机工程与科学》1997,19(2):55-58

ＨＰＦ（ＨｉｇｈＰｅｒｆｏｒｍａｎｃｅＦｏｒｔｒａｎ）是基于数据划分说明的并行语言。如何由数据划分确定程序的计算划分是ＨＰＦ编译器需要首先解决的基本问题。本文介绍了ＨＰＦ的数据划分和计算划分的概念。以三层嵌套循环为例，直观地提出了一种求得计算划分的算法相似文献

10.

Fortran 90语言中辅程序交叉返回的编译技术 总被引：1，自引：0，他引：1

徐赤斌程虎《计算机工程》1996,22(6):3-8

Ｆｏｒｔｒａｎ９０是Ｆｏｒｔｒａｎ语言的最新国际标准。出于对语言兼容性方面的考虑，Ｆｏｒｔｒａｎ９０语言中的辅程序可以有多个出口。该编译系统的目标语言是程序设计语言Ｃ。根据Ｃ语言的语义，在一个函数中只能有一个出口。源语言与目标语言之间存在的语义差距，给编译程序的实现增加了难度。文中着重介绍了Ｆｏｒｔｒａｎ９０语言中辅程序交叉返回的特征及其编译实现的技术。相似文献

11.

面向MPP Fortran 的自动数据分布 总被引：2，自引：0，他引：2

唐新春郭克榕《软件学报》1998,9(2):144-150

自动数据分布是面向大规模并行处理MPP(massively parallel processing)系统程序自动并行化的一项关键技术.数据分布方式直接影响着应用程序在MPP系统上的并行执行性能.本文以MPP Fortran为例,详细探讨了自动数据分布的有关技术,如对准分析、分布方式的产生、静态性能评估和数据重新分布等,并提出了相应的算法.这些算法将在作者研制的面向MPP Fortran 的程序自动并行化工具中实现. 相似文献

12.

程序自动并行化工具FAX

郭克榕唐新春曾丽芳《计算机工程与应用》1999,(9)

该文介绍了大规模并行处理系统程序自动并行化工具FAX（FortranAutomatedXlator）的系统概况。重点阐述了FAX中所采用的先进技术。测试结果表明,FAX已具备一定的可用性及有效性,作为面向分布主存并行机系统的程序自动并行化工具,基本达到了设计目标。相似文献

13.

Implementation of an ADI Method on parallel computers

Raad A. Fatoohi Chester E. Grosch 《Journal of scientific computing》1987,2(2):175-193

In this paper we discuss the implementation of an ADI method for solving the diffusion equation on three parallel/vector computers. The computers were chosen so as to encompass a variety of architectures. They are the MPP, an SIMD machine with 16-Kbit serial processors; Flex/32, an MIMD machine with 20 processors; and Cray/2, an MIMD machine with four vector processors. The Gaussian elimination algorithm is used to solve a set of tridiagonal systems on the Flex/32 and Cray/2 while the cyclic elimination algorithm is used to solve these systems on the MPP. The implementation of the method is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally conclusions are presented. 相似文献

14.

Errors in vector processing and the library libavi.a

Tiarajú A. Diverio Ursula A. Fernandes Dalcidio M. Claudio 《Reliable Computing》1996,2(2):103-109

In this paper, we describe the results of several tests that check the accuracy of numerical computation on the Cray supercomputer in vector and scalar modes. The known tests were modified to identify the critical point where roundings start causing problems. After describing the tests, we present an interval library called libavi.a. It was developed in Fortran 90 on the Cray Y-MP2E supercomputer of UFRGS-Brazil. This library makes interval mathematics accessible to the Cray supercomputers users. It works with real and complex intervals and intervals matrices and vectors. The library allows overloading of operators and functions. It is organized in four modules: real intervals, interval vectors and matrices, complex intervals, and linear algebra applications. 相似文献

15.

Parallel programming with Polaris 总被引：1，自引：0，他引：1

Blume W. Doallo R. Eigenmann R. Grout J. Hoeflinger J. Lawrence T. 《Computer》1996,29(12):78-82

Parallel programming tools are limited, making effective parallel programming difficult and cumbersome. Compilers that translate conventional sequential programs into parallel form would liberate programmers from the complexities of explicit, machine oriented parallel programming. The paper discusses parallel programming with Polaris, an experimental translator of conventional Fortran programs that target machines such as the Cray T3D 相似文献

16.

Parallelizing subroutines in sequential programs

Chih-Ping Chu Carver D.L. 《Software, IEEE》1994,11(1):77-85

An algorithm for making sequential programs parallel is described, which first identifies all subroutines, then determines the appropriate execution mode and restructures the code. It works recursively to parallelize the entire program. We use Fortran in our work, but many of the concepts apply to other languages. Our hardware model is a shared-memory multiprocessor system with a fixed number of identical processors, each with its own local memory connected to a common memory that is accessible to all processors equally. The model implements interprocessor synchronization and communication via special memory locations or special storage. Systems like the Cray X-MP, IBM 3090, and Alliant FX/8 fit this model. Our input is a sequential, structured Fortran program with no overlapping branches. With today's emphasis on writing structured code, this restriction is reasonable. A prototype of a system to implement the algorithm is under development on an IBM 3090 multiprocessor 相似文献

17.

MPPF ORTRAN程序中外部过程调用的多版本技术 总被引：3，自引：0，他引：3

郭克榕唐新春《计算机工程与设计》1996,17(6):49-54

在设计Ｍｏｐｏｒｔｒａｎ并行程序时，若外部过程调用的版本选择不当或哑，实参数的共享，私有类型不匹配，则可能导致程序出错或性能下降。本文提出了一种有效的解决方法；外部过程调用的多版本技术。相似文献

18.

Optimization and Performance of a Fortran 90 MPI-Based Unstructured Code on Large-Scale Parallel Systems

Shires Dale Mohan Ram 《The Journal of supercomputing》2003,25(2):131-141

The message-passing interface (MPI) has become the standard in achieving effective results when using the message passing paradigm of parallelization. Codes written using MPI are extremely portable and are applicable to both clusters and massively parallel computing platforms. Since MPI uses the single program, multiple data (SPMD) approach to parallelism, good performance requires careful tuning of the serial code as well as careful data and control flow analysis to limit communication. We discuss optimization strategies used and their degree of success to increase performance of an MPI-based unstructured finite element simulation code written in Fortran 90. We discuss performance results based on implementations using several modern massively parallel computing platforms including the SGI Origin 3800, IBM Nighthawk 2 SMP, and Cray T3E-1200. 相似文献

19.

A high-performance fast Fourier transform algorithm for the Cray-2

David H. Bailey 《The Journal of supercomputing》1987,1(1):43-60

Most implementations of a radix-2 fast Fourier transform on large scientific computers use algorithms that involve memory accesses whose strides are powers of two. (The term stride means the memory increment between successive elements stored or fetched.) Such strides are unacceptable for recently developed supercomputers, particularly the Cray-2, because of serious difficulties with memory bank conflicts.This article describes an algorithm for evaluating the fast Fourier transform that avoids this difficulty and thus could provide the basis for implementations that more fully utilize the power of the Cray-2. A Fortran program implementing this algorithm is included, and timing comparisons with the Cray assembly-coded library subroutine are shown.The author is with the Numerical Aerodynamic Simulation Systems Division at NASA Ames Research Center. 相似文献

20.

SUPPLE: An efficient run-time support for non-uniform parallel loops

Salvatore Orlando Raffaele Perego 《Journal of Systems Architecture》1999,45(15):1323-1343

This paper presents SUPPLE (SUPort for Parallel Loop Execution), an innovative run-time support for the execution of parallel loops with regular stencil data references and non-uniform iteration costs. SUPPLE relies upon a static block data distribution to exploit locality, and combines static and dynamic policies for scheduling non-uniform iterations. It adopts, as far as possible, a static scheduling policy derived from the owner computes rule, and moves data and iterations among processors only if a load imbalance actually occurs. SUPPLE always tries to overlap communications with useful computations by reordering loop iterations and prefetching remote ones in the case of workload imbalance. The SUPPLE approach has been validated by many experimental results obtained by running a multi-dimensional flame simulation kernel on a 64-node Cray T3D. We have fed the benchmark code with several synthetic input data sets built on the basis of a load imbalance model. We have compared our results with those obtained with a CRAFT Fortran implementation of the benchmark. 相似文献