期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Experimental application-driven architecture analysis of anSIMD/MIMD parallel processing system

Bronson E.C. Casavant T.L. Jamieson L.H. 《Parallel and Distributed Systems, IEEE Transactions on》1990,1(2):195-205

An experimental analysis of the architecture of an SIMD/MIMD parallel processing system is presented. Detailed implementations of parallel fast Fourier transform (FFT) programs were used to examine the performance of the prototype of the PASM (Partitionable SIMD/MIMD) parallel processing system. Detailed execution-time measurements using specialized timing hardware were made for the complete FFT and for components of SIMD, MIMD, and barrier-synchronized MIMD implementations. The component measurements isolated the effects of floating-point arithmetic operations, interconnection network transfer operations, and program control overhead. The measurements allow an accurate extrapolation of the execution time, speedup, and efficiency of the MIMD, SIMD, and barrier-synchronized MIMD programs to a full 1024-processor PASM system. This constitutes one of the first results of this kind, in which controlled experiments on fixed hardware were used to make comparisons of these fundamental modes of computing. Overall, the experimental results demonstrate the value of mixed-mode SIMD/MIMD computing and its suitability for computational intensive algorithms such as the FET 相似文献

2.

Updating distributed variables in local computations

Michael Gerndt 《Concurrency and Computation》1990,2(3):171-193

This paper describes special aspects of MIMD parallelization in SUPERB. SUPERB is an interactive SIMD/MIMD parallelizing system for the SUPRENUM machine. The main topic of this paper is the updating of distributed variables in parallelized applications. The intended applications perform local computations on a large data domain. 相似文献

3.

Comparing shared and distributed memory computers

Clive F. Baillie 《Parallel Computing》1988,8(1-3):101-110

There are two distinct types of MIMD (Multiple Instruction, Multiple Data) computers: the shared memory machine, e.g. Butterfly, and the distributed memory machine, e.g. Hypercubes, Transputer arrays. Typically these utilize different programming models: the shared memory machine has monitors, semaphores and fetch-and-add; whereas the distributed memory machine uses message passing. Moreover there are two popular types of operating systems: a multi-tasking, asynchronous operating system and a crystalline, loosely synchronous operating system.

In this paper I firstly describe the Butterfly, Hypercube and Transputer array MIMD computers, and review monitors, semaphores, fetch-and-add and message passing; then I explain the two types of operating systems and give examples of how they are implemented on these MIMD computers. Next I discuss the advantages and disadvantages of shared memory machines with monitors, semaphores and fetch-and-add, compared to distributed memory machines using message passing, answering questions such as “is one model ‘easier’ to program than the other?” and “which is ‘more efficient‘?”. One may think that a shared memory machine with monitors, semaphores and fetch-and-add is simpler to program and runs faster than a distributed memory machine using message passing but we shall see that this is not necessarily the case. Finally I briefly discuss which type of operating system to use and on which type of computer. This of course depends on the algorithm one wishes to compute. 相似文献

4.

A model of distributed recovery for the SUPRENUM multiprocessor

I. Lehmann F. Hpfl 《Parallel Computing》1988,7(3):395-401

SUPRENUM (Superrechner für numerische Anwendungen) is a German supercomputer project. This paper describes a model for distributed recovery on the SUPRENUM multiprocessor. First we describe the architecture of the SUPRENUM multiprocessor in some definitions. In the next section the distribution of global system checkpoints and an algorithm for distributed reconfiguration is given. 相似文献

5.

SUPRENUM: System essentials and grid applications

Karl Solchenbach Ulrich Trottenberg 《Parallel Computing》1988,7(3):265-281

The SUPRENUM idea, the project, and the system has generally been described and presented in several papers. There is also a great deal of more detailed technical papers describing SUPRENUM as a whole or certain elements of it.

Here we want to give only a very general and rough survey on the essentials of the SUPRENUM system in order to enable the reader to categorize and understand the more specific SUPRENUM papers in this special issue.

Most of the supercomputer applications today are based on grid or grid-like data structures. Grid applications play also an essential role in the SUPRENUM development: in the top-down design of the architecture, in the programming environment, in the parallelization concept of algorithms, and, of course, in the application software development itself. We therefore place some emphasis on this grid orientation in our presentation. 相似文献

6.

Pseudo-random trees in Monte Carlo

Paul Frederickson Robert Hiromoto Thomas L. Jordan Burton Smith Tony Warnock 《Parallel Computing》1984,1(2):175-180

We present the concept of a pseudo-random tree, and generalize the Lehmer pseudo-random number generator as an efficient implementation of the concept. Pseudo-random trees can be used to give reproducibility, as well as speed, in Monte Carlo computations on parallel computers with either the SIMD architecture of the current generation of supercomputer or the MIMD architecture characteristic of the next generation. Monte Carlo simulations based on pseudo-random trees are free of certain pitfalls, even for sequential computers, which can make them considerably more useful. 相似文献

7.

SIMD计算机发展概述

景晓军方滨兴《计算机科学》1995,22(3):4-8

一、引言并行处理是提高计算机性能的有效途径,已成为计算机系统结构研究的热点。IMD(单指令多数据流)计算机由M.J.Flynn,在1966年对计算机系相似文献

8.

Mapping Conjugate Gradient Algorithms for Neutron Diffusion Applications onto SIMD, MIMD, and Mixed-Mode Machines

John John E. So Thomas J. Downar Raghunandan Janardhan Howard Jay Siegel 《International journal of parallel programming》1998,26(2):183-207

The performance of conjugate gradient (CG) algorithms for the solution of the system of linear equations that results from the finite-differencing of the neutron diffusion equation was analyzed on SIMD, MIMD, and mixed-mode parallel machines. A block preconditioner based on the incomplete Cholesky factorization was used to accelerate the conjugate gradient search. The issues involved in mapping both the unpreconditioned and preconditioned conjugate gradient algorithms onto the mixed-mode PASM prototype, the SIMD MasPar MP-1, and the MIMD Intel Paragon XP/S are discussed. On PASM , the mixed-mode implementation outperformed either SIMD or MIMD alone. Theoretical performance predictions were analyzed and compared with the experimental results on the MasPar MP-1 and the Paragon XP/S. Other issues addressed include the impact on execution time of the number of processors used, the effect of the interprocessor communication network on performance, and the relationship of the number of processors to the quality of the preconditioning. Applications studies such as this are necessary in the development of software tools for mapping algorithms onto either a single parallel machine or a heterogeneous suite of parallel machines. 相似文献

9.

An argument for simple COMA

Ashley Saulsbury Tim Wilkinson John Carter Anders Landin 《Future Generation Computer Systems》1995,11(6):553-566

We present design details and some initial performance results of a novel scalable shared memory multiprocessor architecture. This architecture features the automatic data migration and replication capabilities of cache-only memory architecture (COMA) machines, without the accompanying hardware complexity. A software layer manages cache space allocation at a page-granularity — similarly to distributed virtual shared memory (DVSM) systems —leaving simpler hardware to maintain shared memory coherence at a cache line granularity.

By reducing the hardware complexity, the machine cost and development time are reduced. We call the resulting hybrid hardware and software multiprocessor architecture Simple COMA. Preliminary results indicate that the performance of Simple COMA is comparable to that of more complex contemporary all-hardware designs. 相似文献

10.

Heterogeneity in supercomputer architectures

《Parallel Computing》1988,7(3):367-372

Various organization styles have been used in the architecture of supercomputers in order to achieve cost-effective performance and programmability. Traditionally, a particular organization style (e.g., vector pipeline processor, array processor, or multiprocessor) has been selected to satisfy the performance requirements of a class of applications, achieving usually a much lower performance in other applications. In addition, the mapping of ‘foreign’ algorithms to a single-style architecture may create great programming difficulties. Since each architecture style provides attractive cost-performance and programming features, the question of heterogeneity (i.e., combining of several architecture/design styles in a single system) deserves attention. In this paper we discuss some approaches to heterogeneous architectures, identify hardware and software issues, and analyze several built or proposed systems. 相似文献

11.

MilkyWay-2 supercomputer: system and application

Xiangke LIAO Liquan XIAO Canqun YANG Yutong LU 《Frontiers of Computer Science》2014,8(3):345-356

On June 17, 2013, MilkyWay-2 (Tianhe-2) supercomputer was crowned as the fastest supercomputer in the world on the 41th TOP500 list. This paper provides an overview of the MilkyWay-2 project and describes the design of hardware and software systems. The key architecture features of MilkyWay-2 are highlighted, including neo-heterogeneous compute nodes integrating commodity-off-the-shelf processors and accelerators that share similar instruction set architecture, powerful networks that employ proprietary interconnection chips to support the massively parallel message-passing communications, proprietary 16-core processor designed for scientific computing, efficient software stacks that provide high performance file system, emerging programming model for heterogeneous systems, and intelligent system administration. We perform extensive evaluation with wide-ranging applications from LINPACK and Graph500 benchmarks to massively parallel software deployed in the system. 相似文献

12.

Vc: A C++ library for explicit vectorization

Matthias Kretz Volker Lindenstruth 《Software》2012,42(11):1409-1430

It is an established trend that CPU development takes advantage of Moore's Law to improve in parallelism much more than in scalar execution speed. This results in higher hardware thread counts (MIMD) and improved vector units (SIMD), of which the MIMD developments have received the focus of library research and development in recent years. To make use of the latest hardware improvements, SIMD must receive a stronger focus of API research and development because the computational power can no longer be neglected and often auto‐vectorizing compilers cannot generate the necessary SIMD code, as will be shown in this paper. Nowadays, the SIMD capabilities are sufficiently significant to warrant vectorization of algorithms requiring more conditional execution than was originally expected for Streaming SIMD Extension to handle. The Vc library ( http://compeng.uni‐frankfurt.de/?vc ) was designed to support developers in the creation of portable vectorized code. Its capabilities and performance have been thoroughly tested. Vc provides portability of the source code, allowing full utilization of the hardware's SIMD capabilities, without introducing any overhead. Copyright © 2011 John Wiley & Sons, Ltd. 相似文献

13.

On the Complexity of Scheduling MIMD Operations for SIMD Interpretation

《Journal of Parallel and Distributed Computing》1995,29(1):91-95

Programming SIMD hardware to interpret (in parallel) programs and data resident in each PE is a technique for obtaining a cost-effective massively parallel MIMD processing environment. The performance of the synthesized MIMD environment can be greatly improved by with a variable instruction interpreter that delays the interpretation of infrequent operations. In this paper, the process of building a variable instruction interpreter that optimizes an objective function is examined. Two different objective functions are considered, namely, maximizing the total instruction throughput (called Maximal MIMD Instruction Throughput, MMIT) and maximizing overall PE utilization (called Maximal MIMD PE Utilization, MMPU). We show that the decision version of both the MMIT and MMPU problems is NP-complete. 相似文献

14.

A survey of parallel computer architectures 总被引：1，自引：0，他引：1

Duncan R. 《Computer》1990,23(2):5-16

An attempt is made to place recent architectural innovations in the broader context of parallel architecture development by surveying the fundamentals of both newer and more established parallel computer architectures and by placing these architectural alternatives in a coherent framework. The primary emphasis is on architectural constructs rather than specific parallel machines. Three categories of architecture are defined and discussed: synchronous architectures, comprising vector, SIMD (single-instruction-stream, multiple-data-stream) and systolic machines; MIMD (multiple-instruction-stream, multiple-data-stream) with either distributed or shared memory; and MIMD-based paradigms, comprising MIMD/SIMD hybrid, dataflow, reduction, and wavefront types 相似文献

15.

Mapping massive SIMD parallelism onto vector architectures for simulation

Jonathan B. Rosenberg Jonathan D. Becker 《Software》1989,19(8):739-756

A software behavioural simulator for a new massively parallel single-instruction/multiple data (SIMD) architecture has been developed that can accurately simulate the entire 16, 384 bit-serial processor array. The key to this high performance modelling is the exploitation of an inherent mapping that exists between massively parallel SIMD architectures and the vector architectures used in many high performance scientific super-computers. The new SIMD architecture, called BLITZEN, is based on the Massively Parallel Processor (MPP) built for NASA by Goodyear in the late 1970s. By simulating the full-scale machine with very high performance, the simulator allows development of algorithms and high-level software to proceed before realization of the hardware. This paper describes the SIMD - vector architecture mapping, the highly vectorized simulator in which it is used, and how the result was a simulator that achieved a level of performance three orders of magnitude faster than the conventional uniprocessor approach. 相似文献

16.

Hybrid hierarchy storage system in MilkyWay-2 supercomputer

Weixia XU Yutong LU Qiong LI Enqiang ZHOU Zhenlong SONG Yong DONG Wei ZHANG Dengping WEI Xiaoming ZHANG Haitao CHEN Jianying XING Yuan YUAN 《Frontiers of Computer Science》2014,8(3):367-377

With the rapid improvement of computation capability in high performance supercomputer system, the imbalance of performance between computation subsystem and storage subsystem has become more and more serious, especially when various big data are produced ranging from tens of gigabytes up to terabytes. To reduce this gap, large-scale storage systems need to be designed and implemented with high performance and scalability.MilkyWay-2 (TH-2) supercomputer system with peak performance 54.9 Pflops, definitely has this kind of requirement for storage system. This paper mainly introduces the storage system in MilkyWay-2 supercomputer, including the hardware architecture and the parallel file system. The storage system in MilkyWay-2 supercomputer exploits a novel hybrid hierarchy storage architecture to enable high scalability of I/O clients, I/O bandwidth and storage capacity. To fit this architecture, a user level virtualized file system, named H²FS, is designed and implemented which can cooperate local storage and shared storage together into a dynamic single namespace to optimize I/O performance in IO-intensive applications. The evaluation results show that the storage system in MilkyWay-2 supercomputer can satisfy the critical requirements in large scale supercomputer, such as performance and scalability. 相似文献

17.

SUPRENUM - A European made supercomputer

Bernd Schwister Karl Solchenbach 《Future Generation Computer Systems》1990,5(4):381-385

SUPRENUM is the German supercomputer project aiming at the development and construction of a distributed-memory multiprocessor system. Within the SUPRENUM project many application codes are either parallelized or completely new developed. Using the concepts of the Abstract SUPRENUM Machine and some programming environment tools these applications can be parallelized rather easily and straightforward. 相似文献

18.

Hardware system of the Earth Simulator

Shinichi Habata Kazuhiko Umezawa Mitsuo Yokokawa Shigemune Kitawaki 《Parallel Computing》2004,30(12):1287-1313

The Earth Simulator (ES), developed under the Japanese government’s initiative “Earth Simulator project”, is a highly parallel vector supercomputer system. In this paper, an overview of ES, its architectural features, hardware technology and the result of performance evaluation are described.

In May 2002, the ES was acknowledged to be the most powerful computer in the world: 35.86 teraflop/s for the LINPACK HPC benchmark and 26.58 teraflop/s for an atmospheric general circulation code (AFES). Such a remarkable performance may be attributed to the following three architectural features; vector processor, shared-memory and high-bandwidth non-blocking interconnection crossbar network.

The ES consists of 640 processor nodes (PN) and an interconnection network (IN), which are housed in 320 PN cabinets and 65 IN cabinets. The ES is installed in a specially designed building, 65 m long, 50 m wide and 17 m high. In order to accomplish this advanced system, many kinds of hardware technologies have been developed, such as a high-density and high-frequency LSI, a high-frequency signal transmission, a high-density packaging, and a high-efficiency cooling and power supply system with low noise so as to reduce whole volume of the ES and total power consumption.

For highly parallel processing, a special synchronization means connecting all nodes, Global Barrier Counter (GBC), has been introduced. 相似文献

19.

Hierarchical multiple-SIMD architecture for image analysis

Graham Nudd Nick Francis Tim Atherton Darren Kerbyson Roger Packwood John Vaudin 《Machine Vision and Applications》1992,5(2):85-103

Real-time image analysis requires the use of massively parallel machines. Conventional parallel machines consist of an array of identical processors organized in either single instruction multiple data (SIMD) or multiple instruction multiple data (MIMD) configurations. Machines of this type generally only operate effectively on parts of the image analysis problem. SIMD on the low level processing and MIMD on the high level processing. In this paper we describe the Warwick Pyramid Machine, an architecture consisting of both SIMD and MIMD parts in a multiple-SIMD (MSIMD) organization which can operate effectively at all levels of the image analysis problem. 相似文献

20.

Performance of N-body codes on hybrid machines

P. F. G. D. P. M. A. 《Future Generation Computer Systems》2001,17(8):951-959

N-body codes are routinely used for simulation studies of physical systems, e.g. in the fields of computational astrophysics and molecular dynamics. Typically, they require only a moderate amount of run-time memory, but are very demanding in computational power. A detailed analysis of an N-body code performance, in terms of the relative weight of each task of the code, and how this weight is influenced by software or hardware optimisations, is essential in improving such codes. The approach of developing a dedicated device, GRAPE [J. Makino, M. Taiji, Scientific Simulations with Special Purpose Computers, Wiley, New York, 1998], able to provide a very high performance for the most expensive computational task of this code, has resulted in a dramatic performance leap. We explore on the performance of different versions of parallel N-body codes, where both software and hardware improvements are introduced. The use of GRAPE as a ‘force computation accelerator’ in a parallel computer architecture, can be seen as an example of a hybrid architecture, where special purpose device boards help a general purpose (multi)computer to reach a very high performance. 相似文献