期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Data management and control-flow aspects of an SIMD/SPMD parallellanguage/compiler

Nichols M.A. Siegel H.J. Dietz H.G. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(2):222-234

Features of an explicitly parallel programming language targeted for reconfigurable parallel processing systems, where the machine's N processing elements (PEs) are capable of operating in both the SIMD and SPMD modes of parallelism, are described. The SPMD (single program-multiple data) mode of parallelism is a subset of the MIMD mode where all processors execute the same program. By providing all aspects of the language with an SIMD mode version and an SPMD mode version that are syntactically and semantically equivalent, the language facilitates experimentation with and exploitation of hybrid SIMD/SPMD machines. Language constructs (and their implementations) for data management, data-dependent control-flow, and PE-address-dependent control-flow are presented. These constructs are based on experience gained from programming a parallel machine prototype and are being incorporated into a compiler under development. Much of the research presented is applicable to general SIMD machines and MIMD machines 相似文献

2.

Task Scheduling on the PASM Parallel Processing System

《IEEE transactions on pattern analysis and machine intelligence》1985,(2):145-157

PASM is a proposed large-scale distributed/parallel processing system which can be partitioned into independent SIMD/MIMD machines of various sizes. One design problem for systems such as PASM is task scheduling. The use of multiple FIFO queues for nonpreemptive task scheduling is described. Four multiple-queue scheduling algorithms with different placement policies are presented and applied to the PASM parallel processing system. Simulation of a queueing network model is used to compare the performance of the algorithms. Their performance is also considered in the case where there are faulty control units and processors. The multiple-queue scheduling algorithms can be adapted for inclusion in other multiple-SIMD and partitionable SIMD/MIMD systems that use similar types of interconnection networks to those being considered for PASM. 相似文献

3.

Mapping computer-vision-related tasks onto reconfigurableparallel-processing systems

Siegel H.J. Armstrong J.B. Watson D.W. 《Computer》1992,25(2):54-63

A tutorial overview of how selected computer-vision-related algorithms can be mapped onto reconfigurable parallel-processing systems is presented. The reconfigurable parallel-processing system assumed for the discussions is a multiprocessor system capable of mixed-mode parallelism; that is, it can operate in either the SIMD or MIMD modes of parallelism and can dynamically switch between modes at instruction-level granularity with generally negligible overhead. In addition, it can be partitioned into independent or communicating submachines, each having the same characteristics as the original machine. Furthermore, this reconfigurable system model uses a flexible multistage cube interconnection network, which allows the connection patterns among the processors to be varied. It is demonstrated how reconfigurability can be used by reviewing and examining five computer-vision-related algorithms, each one emphasizing a different aspect of reconfigurability 相似文献

4.

Experimental application-driven architecture analysis of anSIMD/MIMD parallel processing system

Bronson E.C. Casavant T.L. Jamieson L.H. 《Parallel and Distributed Systems, IEEE Transactions on》1990,1(2):195-205

An experimental analysis of the architecture of an SIMD/MIMD parallel processing system is presented. Detailed implementations of parallel fast Fourier transform (FFT) programs were used to examine the performance of the prototype of the PASM (Partitionable SIMD/MIMD) parallel processing system. Detailed execution-time measurements using specialized timing hardware were made for the complete FFT and for components of SIMD, MIMD, and barrier-synchronized MIMD implementations. The component measurements isolated the effects of floating-point arithmetic operations, interconnection network transfer operations, and program control overhead. The measurements allow an accurate extrapolation of the execution time, speedup, and efficiency of the MIMD, SIMD, and barrier-synchronized MIMD programs to a full 1024-processor PASM system. This constitutes one of the first results of this kind, in which controlled experiments on fixed hardware were used to make comparisons of these fundamental modes of computing. Overall, the experimental results demonstrate the value of mixed-mode SIMD/MIMD computing and its suitability for computational intensive algorithms such as the FET 相似文献

5.

Hierarchical multiple-SIMD architecture for image analysis

Graham Nudd Nick Francis Tim Atherton Darren Kerbyson Roger Packwood John Vaudin 《Machine Vision and Applications》1992,5(2):85-103

Real-time image analysis requires the use of massively parallel machines. Conventional parallel machines consist of an array of identical processors organized in either single instruction multiple data (SIMD) or multiple instruction multiple data (MIMD) configurations. Machines of this type generally only operate effectively on parts of the image analysis problem. SIMD on the low level processing and MIMD on the high level processing. In this paper we describe the Warwick Pyramid Machine, an architecture consisting of both SIMD and MIMD parts in a multiple-SIMD (MSIMD) organization which can operate effectively at all levels of the image analysis problem. 相似文献

6.

Large-scale parallel processing systems

《Microprocessors and Microsystems》1987,11(1):3-20

Parallel processing is an area of growing interest to the computer science and engineering communities. This paper is an introduction to some of the concepts involved in the design and use of large-scale parallel systems. Parallel machines that are classified as SIMD (synchronous) and MIMD (asynchronous) systems, composed of a large number of microprocessors, are explored. Parallel algorithms are examined, using image smoothing, recursive doubling and contour tracing as examples. Single stage and multistage networks are discussed. The single stage Cube, PM21, Four Nearest Neighbor and Shuffle-Exchange networks are presented, and the multistage Cube network is described. Case studies of three microprocessor-based systems are given as examples of parallel machine designs, specifically the MPP SIMD machine, the Ultracomputer MIMD system, and the PASM SIMD/MIMD machine. 相似文献

7.

Minimizing communication in the bitonic sort

Jae-Dong Lee Batcher K.E. 《Parallel and Distributed Systems, IEEE Transactions on》2000,11(5):459-474

This paper presents bitonic sorting schemes for special-purpose parallel architectures such as sorting networks and for general-purpose parallel architectures such as SIMD and/or MIMD computers. First, bitonic sorting algorithms for shared-memory SIMD and/or MIMD computers are developed. Shared-memory accesses through the interconnection network of shared memory SIMD and/or MIMD computers can be very time consuming. A scheme is introduced which reduces the number of such accesses. This scheme is based on the parity strategy which is the main idea of the paper. By reducing the communication through the network, a performance improvement is achieved. Second, a recirculating bitonic sorting network is presented, which is composed of one level of N/2 comparators plus an Ω-network of (log N-1) switch levels. This network reduces the cost complexity to O(N log N) compared with the O(N log² N) of the original bitonic sorting network, while preserving the same time complexity. Finally, a simplified multistage bitonic sorting network, is presented. For simplifying the interlevel wiring, the parity strategy is used, so N/2 keys are wired straight through the network 相似文献

8.

Mapping Conjugate Gradient Algorithms for Neutron Diffusion Applications onto SIMD, MIMD, and Mixed-Mode Machines

John John E. So Thomas J. Downar Raghunandan Janardhan Howard Jay Siegel 《International journal of parallel programming》1998,26(2):183-207

The performance of conjugate gradient (CG) algorithms for the solution of the system of linear equations that results from the finite-differencing of the neutron diffusion equation was analyzed on SIMD, MIMD, and mixed-mode parallel machines. A block preconditioner based on the incomplete Cholesky factorization was used to accelerate the conjugate gradient search. The issues involved in mapping both the unpreconditioned and preconditioned conjugate gradient algorithms onto the mixed-mode PASM prototype, the SIMD MasPar MP-1, and the MIMD Intel Paragon XP/S are discussed. On PASM , the mixed-mode implementation outperformed either SIMD or MIMD alone. Theoretical performance predictions were analyzed and compared with the experimental results on the MasPar MP-1 and the Paragon XP/S. Other issues addressed include the impact on execution time of the number of processors used, the effect of the interprocessor communication network on performance, and the relationship of the number of processors to the quality of the preconditioning. Applications studies such as this are necessary in the development of software tools for mapping algorithms onto either a single parallel machine or a heterogeneous suite of parallel machines. 相似文献

9.

A method for SIMDMIMD functionally reconfigurable multimicroprocessor systems design and parallel data exchange algorithms

Nikola K Kasabov 《Parallel Computing》1985,2(1):73-78

In

SIMD MIMD

functionally reconfigurable multimicroprocessor systems /MMPS/ some of the microprocessor modules /MPM/ can execute a common program /SIMD mode/ while the rest of the MPMs are executing their own programs /MIMD mode/. Every MPM at any moment can be reconfigured functionally from one to another mode. In this paper the problems of designing such MMPSs are discussed as well as some realisations of a data exchange module as a register module and some algorithms for parallel data exchange between the MPMs. A hierarchically structed MMPS are developed. 相似文献

10.

A parallel algorithm for graph matching and its MasParimplementation

Allen R. Cinque L. Tanimoto S. Shapiro L. Yasuda D. 《Parallel and Distributed Systems, IEEE Transactions on》1997,8(5):490-501

Search of discrete spaces is important in combinatorial optimization. Such problems arise in artificial intelligence, computer vision, operations research, and other areas. For realistic problems, the search spaces to be processed are usually huge, necessitating long computation times, pruning heuristics, or massively parallel processing. We present an algorithm that reduces the computation time for graph matching by employing both branch-and-bound pruning of the search tree and massively-parallel search of the as-yet-unpruned portions of the space. Most research on parallel search has assumed that a multiple-instruction-stream/multiple-data-stream (MIMD) parallel computer is available. Since massively parallel stream (SIMD) computers are much less expensive than MIMD systems with equal numbers of processors, the question arises as to whether SIMD systems can efficiently handle state-space search problems. We demonstrate that the answer is yes, and in particular, that graph matching has a natural and efficient implementation on SIMD machines 相似文献

11.

Parallel processing with minicomputers

Alan Colvin 《Computer Communications》1980,3(3):127-131

The four general methods of parallel processing are described. These are: single instruction single data stream (SISD), multiple instruction single data stream (MISD), single instruction multiple data stream (SIMD), and multiple instruction multiple data stream (MIMD). Most single computers in use are SISD machines, while most parallel processing applications use the MIMD aproach. The paper outlines and compares the four basic MIMD architectures: tightly coupled, loosely coupled, voting, and peripheral processing. The latter is one of the most practical methods using existing minicomputers. Many modern programmable intelligent I/O controllers are peripheral processors. For a computer manufacturer to remain competitive, it is concluded that he will have to include such devices in his hardware. 相似文献

12.

Development and application of parallel processing

《Data Processing》1986,28(8):405-409

Parallel processing can reduce the cost of computing, improve reliability and provide greater throughput. Equally important is the likelihood that real-life problems contain parallelism that should be exploited in the computer architecture if natural solutions are to be found. Many parallel machines are now on the market, both single instruction, multiple data (SIMD) and multiple instruction, multiple data (MIMD), but software engineering techniques have yet to make their complete impact on these architectures. 相似文献

13.

Performance Measures for Evaluating Algorithms for SIMD Machines

《IEEE transactions on pattern analysis and machine intelligence》1982,(4):319-331

This paper examines measures for evaluating the performance of algorithms for single instruction stream–multiple data stream (SIMD) machines. The SIMD mode of parallelism involves using a large number of processors synchronized together. All processors execute the same instruction at the same time; however, each processor operates on a different data item. The complexity of parallel algorithms is, in general, a function of the machine size (number of processors), problem size, and type of interconnection network used to provide communications among the processors. Measures which quantify the effect of changing the machine-size/problem-size/network-type relationships are therefore needed. A number of such measures are presented and are applied to an example SIMD algorithm from the image processing problem domain. The measures discussed and compared include execution time, speed, parallel efficiency, overhead ratio, processor utilization, redundancy, cost effectiveness, speed-up of the parallel algorithm over the corresponding serial algorithm, and an additive measure called "sprice" which assigns a weighted value to computations and processors. 相似文献

14.

Eliminating memory for fragmentation within partitionable SIMD/SPMDmachines

Nichols M.A. Siegel H.J. Dietz H.G. Quong R.W. Nation W.G. 《Parallel and Distributed Systems, IEEE Transactions on》1991,2(3):290-303

Efficient data layout is an important aspect of the compilation process. A model for the creation of perfect memory maps for large-scale parallel machines capable of user-controlled partitionable single-instruction-multiple data/single-program-multiple data (SIMD/SPMD) operation is developed. The term perfect implies that no memory fragmentation occurs and ensures that the memory map size is kept to a minimum. A major constraint on solving this problem is based on the single program nature of both the SIMD and SPMD modes of parallelism. It is assumed that all processors within the same submachine used identical addresses to access corresponding data items in each of their local memories. Necessary and sufficient conditions are derived for being able to create perfect memory maps, and results are applied to several partitionable interconnection networks 相似文献

15.

A survey of parallel computer architectures 总被引：1，自引：0，他引：1

Duncan R. 《Computer》1990,23(2):5-16

An attempt is made to place recent architectural innovations in the broader context of parallel architecture development by surveying the fundamentals of both newer and more established parallel computer architectures and by placing these architectural alternatives in a coherent framework. The primary emphasis is on architectural constructs rather than specific parallel machines. Three categories of architecture are defined and discussed: synchronous architectures, comprising vector, SIMD (single-instruction-stream, multiple-data-stream) and systolic machines; MIMD (multiple-instruction-stream, multiple-data-stream) with either distributed or shared memory; and MIMD-based paradigms, comprising MIMD/SIMD hybrid, dataflow, reduction, and wavefront types 相似文献

16.

Parallel Implementations of Block-Based Motion Vector Estimation for Video Compression on Four Parallel Processing Systems

Min Tan Janet M. Siegel Howard Jay Siegel 《International journal of parallel programming》1999,27(3):195-225

Parallel algorithms, based on a distributed memory machine model, for an exhaustive search technique for motion vector estimation in video compression are being designed and evaluated. Results from the execution on a 16,384 processor MasPar MP-1 (an SIMD machine), a 140 node Intel Paragon XP/S and a 16 node IBM SP2 (two M IMD machines), and the 16 processor PASM prototype (a partitionable SIMD/MIMD mixed-mode machine) are presented. The trade-offs of using different modes of parallelism (SIMD, SPMD, and mixed-mode) and different data partitioning schemes (the rectangular and stripe subimage methods) are examined. The analytical and experimental results shown in this application study will help practitioners to predict and contrast the performance of different approaches to parallel implementation of this important video compression technique. The results presented are also applicable to a large class of image and video processing tasks. Case studies, such as the one presented here, are a necessary step in developing software tools for mapping an application task onto a single parallel machine and for mapping a set of independent application tasks, or the subtasks of a single application task, onto a heterogeneous suite of parallel machines. 相似文献

17.

Comparing SIMD and MIMD Programming Modes

《Journal of Parallel and Distributed Computing》1996,35(1):91-96

The Connection Machine CM-5 supports both SIMD and MIMD programming modes. The SIMD mode is simulated over the MIMD mode. This simulation is likely to lead to a loss of performance in SIMD programs. This paper describes a comparison of the two programming modes with CM Fortran and message-passing Fortran. Two kinds of benchmarks are discussed. The first kind consists of synthetic benchmarks in which we measure time for basic arithmetic operations and communication time. The second kind consists of application benchmarks. The experimental results conclusively show that message-passing Fortran performs considerably better than CM Fortran. While the CM-5 is obsolete, these issues show up in the T3D and other current machines. 相似文献

18.

面向HPC和DC的可重构光互连网络体系结构综述

曹继军《计算机工程与科学》2022,44(6):951-963

互连网络是高性能计算系统和数据中心的核心组件之一,也是决定其系统整体性能的全局性基础设施。随着高性能计算、云计算和大数据技术的迅速发展,传统的电互连网络在性能、能耗和成本等方面无法满足高性能计算应用和数据中心业务的大规模可扩展通信需求,面临着严峻的挑战。为此,近年来相关研究者提出了多种面向高性能计算和数据中心的可重构的光互连网络结构。首先阐明了光互连网络相对于电互连网络的优势;然后介绍了几种典型的可重构光互连网络体系结构,并对其特点进行了分析比较;最后探讨了可重构光互连网络的发展趋势。相似文献

19.

Proteus: A reconfigurable computational network for computer vision

Robert M. Haralick Arun K. Somani Craig Wittenbrink Robert Johnson Kenneth Cooper Linda G. Shapiro Ihsin T. Phillips Jenq-Neng Hwang William Cheung Yung Hsi Yao Chung-Ho Chen Larry Yang Brian Daugherty Bob Lorbeski Kent Loving Tom Miller Larye Parkins Steve Soos 《Machine Vision and Applications》1995,8(2):85-100

The Proteus architecture is a highly parallel, multiple instruction, multiple data machine (MIMD) optimized for large granularity tasks such as machine vision and image processing. The system can achieve 20 gigaflops (80 gigaflops peak). It accepts data via multiple serial links at a rate of up to 640 MB/S. The system employs a hierarchical reconfigurable interconnection network with the highest level being a circuit-switchedenhanced hypercube, serial interconnection network for internal data transfers. The system is designed to use 256 to 1024 RISC processors. The processors use 1-MB externalread/write allocating caches for reduced multiprocessor contention. The system detects, locates, and replaces faulty subsystems using redundant hardware to facilitatefault tolerance. The parallelism is directly controllable through an advanced software system for partitioning, scheduling, and development. System software includes a translator for the INSIGHT language, a parallel debugger, lowand high-level simulators, and a message-passing system for all control needs. Image-processing application software includes a variety of point operators, neighborhood operators, convolution, and the mathematical morphology operations of binary and gray-scale dilation, erosion, opening, and closing. 相似文献

20.

面向多媒体的并行加速系统中可重构网络结构设计 总被引：1，自引：0，他引：1

张晶高文《计算机研究与发展》1995,32(10):16-21

本文讨论了面向多媒体数据处理的并行加速系统硬件平台的设计，采用数字信号处理芯片作为基本的工作单元，提出了一种基于ｍｅｓｈ阵列的可重构网络结构设计及其控制方法，并对其性能进行了定性分析。相似文献