期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Data management and control-flow aspects of an SIMD/SPMD parallellanguage/compiler

Nichols M.A. Siegel H.J. Dietz H.G. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(2):222-234

Features of an explicitly parallel programming language targeted for reconfigurable parallel processing systems, where the machine's N processing elements (PEs) are capable of operating in both the SIMD and SPMD modes of parallelism, are described. The SPMD (single program-multiple data) mode of parallelism is a subset of the MIMD mode where all processors execute the same program. By providing all aspects of the language with an SIMD mode version and an SPMD mode version that are syntactically and semantically equivalent, the language facilitates experimentation with and exploitation of hybrid SIMD/SPMD machines. Language constructs (and their implementations) for data management, data-dependent control-flow, and PE-address-dependent control-flow are presented. These constructs are based on experience gained from programming a parallel machine prototype and are being incorporated into a compiler under development. Much of the research presented is applicable to general SIMD machines and MIMD machines 相似文献

2.

Adapting a Navier-Stokes solver for three parallel machines

R. A. Fatoohi 《The Journal of supercomputing》1994,8(2):91-115

This paper presents the results of parallelizing a three-dimensional Navier-Stokes solver on a 32K-processor Thinking Machines CM-2, a 128-node Intel iPSC/860, and an 8-processor CRAY Y-MP. The main objective of this work is to study the performance of the flow solver, INS3D-LU code, on two distributed-memory machines, a massively parallel SIMD machine (CM-2) and a moderately parallel MIMD machine (iPSC/860), and compare it with its performance on a shared-memory MIMD machine with a small number of processors (Y-MP). The code is based on a Lower-Upper Symmetric-Gauss-Seidel implicit scheme for the pseudocompressibility formulation of the three-dimensional incompressible Navier-Stokes equations. The code was rewritten in CMFORTRAN with shift operations and run on the CM-2 using the slicewise model. The code was also rewritten with distributed data and Intel message-passing calls and run on the iPSC/860. The timing results for two grid sizes are presented and analyzed using both 32-bit and 64-bit arithmetic. Also, the impact of communication and load balancing on the performance of the code is outlined. The results show that reasonable performance can be achieved on these parallel machines. However, the CRAY Y-MP outperforms the CM-2 and iPSC/860 for this particular algorithm.The author is an employee of Computer Sciences Corporation. This work was funded through NASA Contract NAS 2-12961. 相似文献

3.

Experimental application-driven architecture analysis of anSIMD/MIMD parallel processing system

Bronson E.C. Casavant T.L. Jamieson L.H. 《Parallel and Distributed Systems, IEEE Transactions on》1990,1(2):195-205

An experimental analysis of the architecture of an SIMD/MIMD parallel processing system is presented. Detailed implementations of parallel fast Fourier transform (FFT) programs were used to examine the performance of the prototype of the PASM (Partitionable SIMD/MIMD) parallel processing system. Detailed execution-time measurements using specialized timing hardware were made for the complete FFT and for components of SIMD, MIMD, and barrier-synchronized MIMD implementations. The component measurements isolated the effects of floating-point arithmetic operations, interconnection network transfer operations, and program control overhead. The measurements allow an accurate extrapolation of the execution time, speedup, and efficiency of the MIMD, SIMD, and barrier-synchronized MIMD programs to a full 1024-processor PASM system. This constitutes one of the first results of this kind, in which controlled experiments on fixed hardware were used to make comparisons of these fundamental modes of computing. Overall, the experimental results demonstrate the value of mixed-mode SIMD/MIMD computing and its suitability for computational intensive algorithms such as the FET 相似文献

4.

Unstructured tree search on SIMD parallel computers

Karypis G. Kumar V. 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(10):1057-1072

We present new methods for load balancing of unstructured tree computations on large-scale SIMD machines, and analyze the scalability of these and other existing schemes. An efficient formulation of tree search on an SIMD machine consists of two major components: a triggering mechanism, which determines when the search space redistribution must occur to balance the search space over processors, and a scheme to redistribute the search space. We have devised a new redistribution mechanism and a new triggering mechanism. Either of these can be used in conjunction with triggering and redistribution mechanisms developed by other researchers. We analyze the scalability of these mechanisms and verify the results experimentally. The analysis and experiments show that our new load-balancing methods are highly scalable on SIMD architectures. Their scalability is shown to he no worse than that of the best load-balancing schemes on MIMD architectures. We verify our theoretical results by implementing the 15-puzzle problem on a CM-2 SIMD parallel computer 相似文献

5.

The Genesis distributed-memory benchmarks. Part 1: Methodology and general relativity benchmark with results for the SUPRENUM computer

Cliff Addison James Allwright Norman Binsted Nigel Bishop Bryan Carpenter Peter Dalloz David Gee Vladimir Getov Tony Hey Roger Hockney Max Lemke John Merlin Mark Pinches Chris Scott Ivan Wolton 《Concurrency and Computation》1993,5(1):1-22

This is the first of a series of papers on the Genesis distributed-memory benchmarks, which were developed under the European ESPRIT research program. The benchmarks provide a standard reference Fortran77 uniprocessor version, a distributed memory. MIMD version, and in some cases a Fortran90 version suitable for SIMD computers. The problems selected all have a scientific origin (mostly from physics or theoretical chemistry), and range from synthetic code fragments designed to measure the basic hardware properties of the computer (especially communication and synchronisation overheads), through commonly used library subroutines, to full application codes. This first paper defines the methodology to be used to analyse the benchmark results, and gives an example of a fully analysed application benchmark from General Relativity (GR1). First, suitable absolute performance metrics are carefully defined, then the performance analysis treats the execution time and absolute performance as functions of at least two variables, namely the problem size and the number of proecssors. The theoretical predictions are compared with, or fitted to, the measured results, and then used to predict (with due caution) how the performance might scale for larger problems and more processors than were actually available during the benchmarking. Benchmark measurements are given primarily for the German SUPRENUM computer, but also for the IBM 3083J, Convex C210 and a Parsys Supernode with 32 T800-20 transputers. 相似文献

6.

一个数据并行语言的设计及其实现

陈斯愈黄林鹏《计算机工程》1997,23(3):3-6

数据并行模型应用到ＭＩＭＤ机器上，实现ＳＰＭＤ模式的松散同步的方式越来越受到人们的重视。文中提出了一个以屏构并行系统为环境的数据并行语言Ｍｕｌｔｉ－ｃ的设计和实现。正在实现的Ｍｕｌｉｔｉ－ｃ编译器，以预编译的方式接受ＳＩＭＤ形式的程序说明，放宽同步要求，产生能以ＳＰＭＫ方式在并行系统上运行的Ｃ程序。相似文献

7.

Functionally reconfigurable general purpose parallel machines and some image processing and pattern recognition applications

Nikola K Kasabov 《Pattern recognition letters》1985,3(3):215-223

Functionally reconfigurable general purpose parallel machines (FRPM) could be reconfigured during the operation from SIMD to MIMD mode or vice versa (first aspect) and from one interconnection network to another according to the data storing order (second aspect). General purpose machines are considered in order to obtain an arbitrary data exchange between the processing elements they are built of. A model for describing such interconnection networks is presented. A full-information exchange network in introduced which is reconfigurable in a programming way to tree-, matrix-, cube-, linear-neighbourhood and FFT-network. Some schemes for constructing SIMD/MIMD reconfigurable machines are given. The usefullness of using FRMP for image processing and pattern recognition is discussed. 相似文献

8.

Parallel simulation of DEDS via event synchronization

Jian-Qiang Hu 《Discrete Event Dynamic Systems》1995,5(2-3):167-186

In this paper we use the event synchronization scheme to develop a new method for parallel simulation of many discrete event dynamic systems simultaneously. Though a few parallel simulation methods have been developed during the last several years, such as the well-known Standard Clock method, most of them are largely limited to Markovian systems. The main advantage of our method is its applicability to non-Markovian systems. For Markovian systems a comparison study on efficiency between our method and the Standard Clock method is done on Connection Machine CM-5. CM-5 is a parallel machine with both SIMD (Single Instruction, Multiple Data) and MIMD (Multiple Instruction Multiple Data) architectures. The simulation results show that if event rates of Markovian systems do not differ by much then both methods are compatible but the Standard Clock method performs better in most cases. For Markovian systems with very different event rates, our method often yields better results. Most importantly, our simulation results also show that our method works as efficiently for non-Markovian systems as for Markovian systems. 相似文献

9.

Image understanding architecture: exploiting potential parallelismin machine vision

Weems C.C. Riseman E.M. Hanson A.R. 《Computer》1992,25(2):65-68

A hardware architecture that addresses at least part of the potential parallelism in each of the three levels of vision abstraction, low (sensory), intermediate (symbolic), and high (knowledge-based), is described. The machine, called the image understanding architecture (IUA), consists of three different, tightly coupled parallel processors; the content addressable array parallel processor (CAAPP) at the low level, the intermediate communication associative processor (ICAP) at the intermediate level, and the symbolic processing array (SPA) at the high level. The CAAPP and ICAP levels are controlled by an array control unit (ACU) that takes its directions from the SPA level. The SPA is a multiple-instruction multiple-data (MIMD) parallel processor, while the intermediate and low levels operat in multiple modes. The CAAPP operates in single-instruction multiple-data (SIMD) associative or multiassociative mode, and the ICAP operates in single-program multiple-data (SPMD) or MIMD mode 相似文献

10.

Integrating multiple parallel programming paradigms in a dataflow-based software environment

Gang Cheng Geoffrey C. Fox 《Concurrency and Computation》1996,8(9):667-684

By viewing different parallel programming paradigms as essentially heterogeneous approaches in mapping ‘real-world’ problems to parallel systems, the authors discuss methodologies in integrating multiple programming models on a massively parallel system such as Connection Machine CM5. Using a dataflow based integration model built in a visualization software AVS, the authors describe a simple, effective and modular way to couple sequential, data-parallel and explicit message-passing modules into an integrated parallel programming environment on a CM5. A case study in the area of numerical advection modeling is given to demonstrate the integration of data-parallel and message-passing modules in the proposed multi-paradigm programming environment. 相似文献

11.

Parallel solution of large-scale differential-algebraic systems

R. S. Maier W. Rath L. R. Petzold 《Concurrency and Computation》1995,7(8):795-822

DASPK solves large-scale systems of differential-algebraic equations. It is based on the integration method in DASSL, but instead of a direct method for the associated linear systems which arise at each time step, the preconditioned GMRES iteration is applied in combination with an inexact Newton method. Two parallel versions of DASPK have been developed: DASPKF90, a Fortran 90 data parallel implementation, and DASPKMP, a message-passing implementation written in Fortran 77 with extended BLAS. The parallel versions have been implemented for the Thinking Machines Corporation (TMC) CM-5, a massively parallel multiprocessor, keeping the user interface relatively simple while allowing for portability to other massively parallel architectures. The codes have been demonstrated on several large-scale test problems, including three-dimensional formulations of the heat equation, the Cahn-Hilliard equation and a multi-species reaction-diffusion problem. The formulations are described, including detail on preconditioning the Krylov iteration, timing results and performance analysis. 相似文献

12.

X流处理器的研究与实现

贾川隋兵才《计算机与现代化》2007,(2):123-126

比较了典型的MIMD和SIMD两种流处理器结构的优劣,给出了SIMD流处理器的一种实现方式,介绍了流处理器上的两级编程模式.研究表明,流处理器作为新型处理器,在很多领域都有其优越性相似文献

13.

多目标动态规划时段轮换并行算法

康一梅吴沧浦《自动化学报》1994,20(5):561-569

针对SIMD和MIMD结构的并行机提出多目标动态规划时段轮换并行算法,多目标动态规划的时段轮换迭代算法,将全过程优化问题转化成子过程优化问题,然后在子过程非劣解集中寻找全过程非劣解.这样,将多目标动态规划内存不足的问题转化成时间问题,然后利用并行机超高速运算的优势来有效地解决内存不足问题.通过时间复杂性、加速比分析及实例. 说明了算法的有效性及优越性. 相似文献

14.

基于状态方程组并行任务划分的策略 总被引：3，自引：0，他引：3

陈德来焦进等《计算机学报》1996,19(5):282-287

本文在分析了状态方程组并行求解时机间通信对求解性能的影响后，提出了减少求解时间和机间通信数据量的任务划分策略，并运用模拟退火算法实现该策略。结果表明，该策略划分产生的并行求解任务均衡，并行加速比高，适用于种类MIMD系统。相似文献

15.

Signal processing with transputer arrays (TRAPS)

《Computer Physics Communications》1985,37(1-3):77-86

The transputer and its programming language occam present new possibilities in designing and programming highly concurrent digital processors. This paper describes early work in evaluating the signal processing capability of SIMD/MIMD TRansputer Array Processors of TRAPs using hardware emulators in lieu of single chip transputers.Factors controlling the communication overheads in the array are considered. Algorithms for two-dimensional convolution. thresholding, histogramming and segmentation of images and for 1-D FFTs have been programmed and run. The efficiency of a ‘wavefront’ processor for least-squares minimisation has been explored in real time experiments.Performance estimates made are related to the real transputer. 相似文献

16.

A Data Parallel Scientific Modeling Language

《Journal of Parallel and Distributed Computing》1994,21(1):46-60

The data parallel meta language (DPML) and its associated Fortran source code rewriter (DP77) support architecture independent, high performance climate and weather prediction models. The language allows the data domains over which a program operates, the communication patterns required between elements of those data domains, and some or all of the calculations of a program to be expressed at a very high level. DPML uses explicit data parallelism to express the inherent parallelism of the models, with the result that programs are easily compilable into target machine code. DP77 uses information from the DPML program to translate Fortran routines into the host specific Fortran form required for their parallel execution within the model. This paper describes the general strategy behind the development of DPML, discusses its language features using examples drawn from climate modelling, and provides details of the mechanism it uses for incorporating Fortran into data parallel programs. Encouraging results are reported for DPML versions of the standard weather benchmark models executing on vector, SIMD, and MIMD (shared memory) machines. While the paper is set within the framework of climate modelling, the technique has obvious wider implications. 相似文献

17.

DPOS: A metalanguage and programming environment for parallel processing

John D. Evans Robert R. Kessler 《LISP and Symbolic Computation》1992,5(1-2):105-125

This paper describes and illustrates a structured programming metalanguage (DPOS) and graphical programming environment for generating and debugging high-level distributed MIMD parallel programs. DPOS introduces an innovative message-passing model and also recursive graphical definition of parallel process networks. It also provides programming and debugging at the meta language level that is portable across implementation languages. The initial development focus of DPOS is to provide a parallel development system for Lisp-based, symbolic and artificial intelligence programs as part of the MAYFLY parallel processing project. The DPOS environment also generates source code and provides a simulation system for graphical debugging and animation of the programs in graph form. 相似文献

18.

A Comparison of Implicitly Parallel Multithreaded and Data-Parallel Implementations of an Ocean Model

《Journal of Parallel and Distributed Computing》1998,48(1):1-51

Two parallel implementations of a state-of-the-art ocean model are described and analyzed: one is written in the implicitly parallel language Id for the Monsoon multithreaded dataflow architecture, and the other in data-parallel CM Fortran for the CM-5. The multithreaded programming model is inherently more expressive than the data-parallel model but is not especially adapted to regular data structures common to many scientific codes. One goal of this study is to understand what, if any, are the performance penalties of multithreaded execution when implementing a program that is well suited for data-parallel execution. To avoid technology and machine configuration issues, the two implementations are compared in terms of overhead cycles perrequiredfloating point operation. When flows in complex geometries typical of ocean basins are simulated, the data-parallel model only remains efficient if redundant computations are performed over land. The generality of the Id programming model, however, allows one to easily and transparently implement a parallel code that computes only in the ocean. When ocean basins with complex and irregular geometry are simulated the normalized performance on Monsoon is comparable with that of the CM-5. For more regular geometries that map well to the computational domain, the data-parallel approach proves to be a better match. We conclude by examining the extent to which clusters of mainstream symmetric multiprocessor (SMP) systems offer a scientific computing environment which can capitalize on and combine the strengths of the two paradigms. 相似文献

19.

Distributed implementations of communicating objects

Weijia Jia Gaetan Libert 《Concurrency and Computation》1995,7(6):515-541

The paper presents the design and implementation of a CSP-based object-oriented system. The system consists of a specification model, Communicating-object, and a prototype system, C-OBJECT, supporting the model. The objects execute in a set of parallel processes called actions. The dynamic communicating objects exchange messages by both data transmissions and function invocations. The C-OBJECT prototype is constructed in a MIMD architecture (32-node transputer) with C++ which is composed of two parts: network configuration and a Communicating-object service subsystem (library) providing various levels of message-passing primitives. The initial prototype with good performance has shown its availability for C and C++ programming. The integrated system facilitates application software with tools of specification, design and implementation. 相似文献

20.

Sparse QR factorization on a massively parallel computer 总被引：1，自引：0，他引：1

Steven G. Kratzer 《The Journal of supercomputing》1992,6(3-4):237-255

This paper shows that QR factorization of large, sparse matrices can be performed efficiently on massively parallel SIMD (single instruction stream/multiple data stream) computers such as the Connection Machine CM-2. The problem is cast as a dataflow graph, whose nodes are mapped to a virtual dataflow machine in such a way that only nearest-neighbor communication is required. This virtual machine is implemented by programming the CM-2 processors to support a restricted dataflow protocol. Execution results for several test matrices show that good performance can be obtained without relying on nested dissection techniques. 相似文献