共查询到20条相似文献,搜索用时 0 毫秒
1.
An environment that lets system applications be expressed as virtual machines, through which architecture-independent multiple-instruction, multiple-data stream (MIMD) programs are written, is described. The virtual machine hides the hardware configuration from the programmer so that the MIMD programming environment always appears the same, regardless of the actual hardware. The data-definition and procedural high-level languages used in the environment and the generation of object code in the environment are discussed. The runtime configuration of the system and an implemented prototype of the system are described 相似文献
2.
Various proposals for networks of large numbers of processors are reviewed. Bottleneck problems arise in these networks with the flow of data between processors. Communication problems which can arise in practical situations are discussed and techniques for reducing bottlenecks are developed. Some simulation results are given for the binary n-cube. 相似文献
3.
Bronson E.C. Casavant T.L. Jamieson L.H. 《Parallel and Distributed Systems, IEEE Transactions on》1990,1(2):195-205
An experimental analysis of the architecture of an SIMD/MIMD parallel processing system is presented. Detailed implementations of parallel fast Fourier transform (FFT) programs were used to examine the performance of the prototype of the PASM (Partitionable SIMD/MIMD) parallel processing system. Detailed execution-time measurements using specialized timing hardware were made for the complete FFT and for components of SIMD, MIMD, and barrier-synchronized MIMD implementations. The component measurements isolated the effects of floating-point arithmetic operations, interconnection network transfer operations, and program control overhead. The measurements allow an accurate extrapolation of the execution time, speedup, and efficiency of the MIMD, SIMD, and barrier-synchronized MIMD programs to a full 1024-processor PASM system. This constitutes one of the first results of this kind, in which controlled experiments on fixed hardware were used to make comparisons of these fundamental modes of computing. Overall, the experimental results demonstrate the value of mixed-mode SIMD/MIMD computing and its suitability for computational intensive algorithms such as the FET 相似文献
4.
A parallel FFT on an MIMD machine 总被引:5,自引:0,他引:5
In this paper we present a parallelization of the Cooley- Tukey FFT algorithm that is implemented on a shared-memory MIMD (non-vector) machine that was built in the Dept. of Computer Science, Tel Aviv University. A parallel algorithm is presented for one dimension Fourier transform with performance analysis. For a large array of complex numbers to be transformed, an almost linear speed-up is demonstrated. This algorithm can be executed by any number of processors, but generally the number is much less than the length of the input data. 相似文献
5.
Parallel Gaussian elimination on an MIMD computer 总被引:3,自引:0,他引:3
This paper introduces a graph-theoretic approach to analyse the performances of several parallel Gaussian-like triangularization algorithms on an MIMD computer. We show that the SAXPY, GAXPY and DOT algorithms of Dongarra, Gustavson and Karp, as well as parallel versions of the LDMt, LDLt, Doolittle and Cholesky algorithms, can be classified into four task graph models. We derive new complexity results and compare the asymptotic performances of these parallel versions. 相似文献
6.
《Parallel Computing》1990,15(1-3):133-145
This paper describes a parallel algorithm for the LU decomposition of band matrices using Gaussian elimination. The matrix dimension is n × n with 2r−1 diagonals. In the case when 1 r 2 p an optimal number of the processors,
, is determined according to the equation
. When 2 p r n a number of processors, p, statged by Veldhorst is adopted (see [7]). For band matrix with 2r-1 diagonals (1 r 2p) the task scheduling procedure with the aim to obtain maximal parallelism in system operation, i.e. good load balancing, is defined. The architecture of the system is of MIMD type. The connection between the processors is realised via a common bus. Communication and synchronization is performed by message passing technique. 相似文献
7.
The performance of a parallel program executed on a message passing MIMD computer is determined mainly by the efficiency of the communication among the processors and the efficiency of the calculation carried out in each processor. In this paper we present the results of the experiments related to the efficiency of the communication of a T800 transputer based system. The results of these experiments are used to determine the basic hardware parameters for the communication capabilities of the system. Such parameters are the asymptotic rate of data transfer (r∞) and the message length required to obtain half the asymptotic rate (n1/2). These performance results will help us to evaluate new implementations or new architectures. 相似文献
8.
A mesh-vertex finite volume scheme for solving the Euler equations on triangular unstructured meshes is implemented on a MIMD (multiple instruction/multiple data stream) parallel computer. Three partitioning strategies for distributing the work load onto the processors are discussed. Issues pertaining to the communication costs are also addressed. We find that the spectral bisection strategy yields the best performance. The performance of this unstructured computation on the Intel iPSC/860 compares very favorably with that on a one-processor CRAY Y-MP/1 and an earlier implementation on the Connection Machine.The authors are employees of Computer Sciences Corporation. This work was funded under contract NAS 2-12961 相似文献
9.
A parallel ray tracing algorithm is presented. It subdivides the seene into 3D regions, the adjacency of which is modelled by a connectivity graph of regions. Since with each region is associated a ray tracing process, this graph becomes a graph of processes, the edges of which represent the communications between processes. This graph of processes is suitably mapped onto a hypercube topology so as to minimize the communication cost. Static load balancing is performed and solutions are brought to the problems of network congestion and termination.This work has been supported byC
3 and by the CCETT (Centre Commun d'Etudes de Télédiffusion et Télécommunications) under contract 86ME46 相似文献
10.
11.
12.
Some aspects of a long-term parallel-processing research project (PACS, Pax, and Qcd Pax) begun in 1977 at Kyoto University and Hitachi Corporation's Nuclear Power Division are discussed. The discussion is based on an analysis of a number of papers, a book detailing this work, several visits to the project laboratory in Japan, and an examination of some programs that now run on cd Pax. The initial name, processor array for continuum simulation (PACS), was soon changed to Processor Array experiment, or Pax. Qcd Pax (for quantum chromodynamics) is the current running computer. The characteristics of the family are described, and the hardware, communication, and memory functions of the host computer, the use of four levels of parallelism programming, and performance are examined 相似文献
13.
A three-dimensional electromagnetic particle-in-cell code with Monte Carlo collision (PIC-MCC) is developed for MIMD parallel supercomputers. This code uses a standard relativistic leapfrog scheme incorporating Monte Carlo calculations to push plasma particles and to include collisional effects on particle orbits. A local finite-difference time-domain method is used to update the self-consistent electromagnetic fields. The code is implemented using the General Concurrent PIC (GCPIC) algorithm, which uses domain decomposition to divide the computation among the processors. Particles must be exchanged between processors as they move among subdomains. Message passing is implemented using the Express Cubix library and the PVM. We evaluate the performance of this code using a 512-processor Intel Touchstone Delta, a 512-processor Intel Paragon, and a 256-processor CRAY T3D. It is shown that a high parallel efficiency exceeding 95% has been achieved on all three machines for large problems. We have run PIC-MCC simulations using several hundred million particles with several million collisions per time step. For these large-scale simulations the particle push time achieved is in the range of 90–115 ns/particle/time step, and the collision calculation time in the range of a few hundred nanoseconds per collision. 相似文献
14.
van Reeuwijk K. Denissen W. Sips H.J. Paalvast E.M.R.M. 《Parallel and Distributed Systems, IEEE Transactions on》1996,7(9):897-914
Data parallel languages, like High Performance Fortran (HPF), support the notion of distributed arrays. However, the implementation of such distributed array structures and their access on message passing computers is not straightforward. This holds especially for distributed arrays that are aligned to each other and given a block-cyclic distribution. In this paper, an implementation framework is presented for HPF distributed arrays on message passing computers. Methods are presented for efficient (in space and time) local index enumeration, local storage, and communication. Techniques for local set enumeration provide the basis for constructing local iteration sets and communication sets. It is shown that both local set enumeration and local storage schemes can be derived from the same equation. Local set enumeration and local storage schemes are shown to be orthogonal, i.e., they can be freely combined. Moreover, for linear access sequences generated by our enumeration methods, the local address calculations can be moved out of the enumeration loop, yielding efficient local memory address generation. The local set enumeration methods are implemented by using a relatively simple general transformation rule for absorbing ownership tests. This transformation rule can be repeatedly applied to absorb multiple ownership tests. Performance figures are presented for local iteration overhead, a simple communication pattern, and storage efficiency 相似文献
15.
This paper reports on the initial stages of a research project involving the development of an experimental parallel computing system. The system is intended to support research in a variety of areas, including computer architectures, task models, language implementations and interconnection techniques. A pragmatic approach is being undertaken to evaluating techniques in these areas through applying the system to actual applications.
The system is based on a hierarchical structure of processors, using shared memory as the communication medium. This paper identifies the main features of the hardware being used, and presents an outline of the initial software for task creation and management. 相似文献
16.
17.
18.
We describe portable software to simulate universal quantum computers on massive parallel computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray X1E, a SGI Altix 3700 and clusters of PCs running Windows XP. We study the performance of the software by simulating quantum computers containing up to 36 qubits, using up to 4096 processors and up to 1 TB of memory. Our results demonstrate that the simulator exhibits nearly ideal scaling as a function of the number of processors and suggest that the simulation software described in this paper may also serve as benchmark for testing high-end parallel computers. 相似文献
19.
William J. Dally 《New Generation Computing》1993,11(3-4):227-249
Advances in interconnection network performance and interprocessor interaction mechanisms enable the construction of fine-grain parallel computers in which the nodes are physically small and have a small amount of memory. This class of machines has a much higher ratio of processor to memory area and hence provides greater processor throughput and memory bandwidth per unit cost relative to conventional memory-dominated machines. This paper describes the technology and architecture trends motivating fine-grain architecture and the enabling technologies of high-performance interconnection networks and low-overhead interaction mechanisms. We conclude with a discussion of our experiences with the J-Machine, a prototype fine-grain concurrent computer. 相似文献
20.
An effective speedup metric for measuring productivity in large-scale parallel computer systems 总被引:1,自引:0,他引:1
With the parallel computer systems scaling-up, the measure index for performance of the systems demands a shift from traditional “high performance” to “high productivity.” This brings a new challenge to defining a synthetic, yet meaningful, measure index of multiple productivity variables; namely computing performance, reliability, energy consumption, parallel software development, etc. Traditional measures for large-scale parallel computer systems merely focus on computing performance, and are incapable of measuring the multiple productivity variables simultaneously in an effective manner. A recently proposed market-related money model, which pursues high utility/cost ratio, relies on money as a measure to consider the multiple productivity variables. Differing from the previous models, this paper proposes a novel system productivity speedup metric for large-scale parallel computer systems. The metric uses speedup instead of money to comprehensively unify the measures of multiple productivity variables. Finally, we propose a trade-off productivity measurement to weigh different productivity variables, to address different design targets. The measurement can facilitate the system evaluation, expose future technique tendencies, and guide future system design. 相似文献