期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Alan Colvin 《Computer Communications》1980,3(3):127-131

The four general methods of parallel processing are described. These are: single instruction single data stream (SISD), multiple instruction single data stream (MISD), single instruction multiple data stream (SIMD), and multiple instruction multiple data stream (MIMD). Most single computers in use are SISD machines, while most parallel processing applications use the MIMD aproach. The paper outlines and compares the four basic MIMD architectures: tightly coupled, loosely coupled, voting, and peripheral processing. The latter is one of the most practical methods using existing minicomputers. Many modern programmable intelligent I/O controllers are peripheral processors. For a computer manufacturer to remain competitive, it is concluded that he will have to include such devices in his hardware. 相似文献

2.

Development and application of parallel processing

《Data Processing》1986,28(8):405-409

Parallel processing can reduce the cost of computing, improve reliability and provide greater throughput. Equally important is the likelihood that real-life problems contain parallelism that should be exploited in the computer architecture if natural solutions are to be found. Many parallel machines are now on the market, both single instruction, multiple data (SIMD) and multiple instruction, multiple data (MIMD), but software engineering techniques have yet to make their complete impact on these architectures. 相似文献

3.

Parallel computer architectures for image processing

《Computer Vision, Graphics, and Image Processing》1984,25(1):68-88

Image processing problems frequently involve large structured arrays of data and a need for very rapid computation. Special parallel processing schemes have evolved over the last 20 years to deal with these problems. In this paper many parallel systems which have been developed for image processing are outlined and the features of their underlying architectures are discussed. Most of these special architectures may be loosely classified as either SIMD or pipeline structures although some MIMD structures have been designed for high level image analysis. In recent years several multiple SIMD (MSIMD) schemes have been proposed as suitable architectures for image processing. The fundamental problems of developing an effective MSIMD system are discussed and a simple SIMD/MIMD computational model for comparison with such systems is proposed. 相似文献

4.

Functionally reconfigurable general purpose parallel machines and some image processing and pattern recognition applications

Nikola K Kasabov 《Pattern recognition letters》1985,3(3):215-223

Functionally reconfigurable general purpose parallel machines (FRPM) could be reconfigured during the operation from SIMD to MIMD mode or vice versa (first aspect) and from one interconnection network to another according to the data storing order (second aspect). General purpose machines are considered in order to obtain an arbitrary data exchange between the processing elements they are built of. A model for describing such interconnection networks is presented. A full-information exchange network in introduced which is reconfigurable in a programming way to tree-, matrix-, cube-, linear-neighbourhood and FFT-network. Some schemes for constructing SIMD/MIMD reconfigurable machines are given. The usefullness of using FRMP for image processing and pattern recognition is discussed. 相似文献

5.

Data management and control-flow aspects of an SIMD/SPMD parallellanguage/compiler

Nichols M.A. Siegel H.J. Dietz H.G. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(2):222-234

Features of an explicitly parallel programming language targeted for reconfigurable parallel processing systems, where the machine's N processing elements (PEs) are capable of operating in both the SIMD and SPMD modes of parallelism, are described. The SPMD (single program-multiple data) mode of parallelism is a subset of the MIMD mode where all processors execute the same program. By providing all aspects of the language with an SIMD mode version and an SPMD mode version that are syntactically and semantically equivalent, the language facilitates experimentation with and exploitation of hybrid SIMD/SPMD machines. Language constructs (and their implementations) for data management, data-dependent control-flow, and PE-address-dependent control-flow are presented. These constructs are based on experience gained from programming a parallel machine prototype and are being incorporated into a compiler under development. Much of the research presented is applicable to general SIMD machines and MIMD machines 相似文献

6.

A class of SIMD machines simulated by systolic arrays

Hiroshi Umeo 《Journal of Parallel and Distributed Computing》1985,2(4):391-403

In this paper we introduce a new subclass of single instruction steam/multiple data stream (SIMD) machines, referred to as a simple SIMD, then consider an implementation of a class of simple SIMD parallel algorithms onto systolic arrays, which have been considered as one candidate for VLSI-based cellular computers. The class of simple SIMD algorithms is so large that it includes many conventional SIMD algorithms, such as sorting, image processing, and graph algorithms. We develop several time-efficient algorithms for the simulations of simple SIMD machines, which have global data communications, by systolic arrays with only local data communications. The systolic simulation theorems enable us to use many conventional SIMD algorithms on the systolic arrays with little loss of time efficiency. 相似文献

7.

Large-scale parallel processing systems

《Microprocessors and Microsystems》1987,11(1):3-20

Parallel processing is an area of growing interest to the computer science and engineering communities. This paper is an introduction to some of the concepts involved in the design and use of large-scale parallel systems. Parallel machines that are classified as SIMD (synchronous) and MIMD (asynchronous) systems, composed of a large number of microprocessors, are explored. Parallel algorithms are examined, using image smoothing, recursive doubling and contour tracing as examples. Single stage and multistage networks are discussed. The single stage Cube, PM21, Four Nearest Neighbor and Shuffle-Exchange networks are presented, and the multistage Cube network is described. Case studies of three microprocessor-based systems are given as examples of parallel machine designs, specifically the MPP SIMD machine, the Ultracomputer MIMD system, and the PASM SIMD/MIMD machine. 相似文献

8.

Performance Measures for Evaluating Algorithms for SIMD Machines

《IEEE transactions on pattern analysis and machine intelligence》1982,(4):319-331

This paper examines measures for evaluating the performance of algorithms for single instruction stream–multiple data stream (SIMD) machines. The SIMD mode of parallelism involves using a large number of processors synchronized together. All processors execute the same instruction at the same time; however, each processor operates on a different data item. The complexity of parallel algorithms is, in general, a function of the machine size (number of processors), problem size, and type of interconnection network used to provide communications among the processors. Measures which quantify the effect of changing the machine-size/problem-size/network-type relationships are therefore needed. A number of such measures are presented and are applied to an example SIMD algorithm from the image processing problem domain. The measures discussed and compared include execution time, speed, parallel efficiency, overhead ratio, processor utilization, redundancy, cost effectiveness, speed-up of the parallel algorithm over the corresponding serial algorithm, and an additive measure called "sprice" which assigns a weighted value to computations and processors. 相似文献

9.

On the Complexity of Scheduling MIMD Operations for SIMD Interpretation

《Journal of Parallel and Distributed Computing》1995,29(1):91-95

Programming SIMD hardware to interpret (in parallel) programs and data resident in each PE is a technique for obtaining a cost-effective massively parallel MIMD processing environment. The performance of the synthesized MIMD environment can be greatly improved by with a variable instruction interpreter that delays the interpretation of infrequent operations. In this paper, the process of building a variable instruction interpreter that optimizes an objective function is examined. Two different objective functions are considered, namely, maximizing the total instruction throughput (called Maximal MIMD Instruction Throughput, MMIT) and maximizing overall PE utilization (called Maximal MIMD PE Utilization, MMPU). We show that the decision version of both the MMIT and MMPU problems is NP-complete. 相似文献

10.

A Heterogeneous Mixed-Mode Execution Model for Massively Parallel Systems

《Journal of Parallel and Distributed Computing》1999,56(1):2-16

In this paper, we consider a massively parallel system that is composed of heterogeneous processors, that is, processors with different processing power, and that combines the advantages of the SIMD and MIMD architectures. The heterogeneous mixed-mode (HeMM) execution model is composed of two main components, which operate in the well-known SIMD and MIMD paradigms. The main computing power comes from a component that is composed of a massive number of processors and operates in a data parallel manner. The other component is composed of a few (or even one) fast processors which operate in the MIMD paradigm. The operation of a small number of processors in an MIMD paradigm has been well demonstrated through actual systems. The processors in this component add flexibility to the execution of the parallel programs such that it adjusts to the changing parallelism of the program to enhance the performance. Based on this execution model we analyze the gains in performance that is obtainable by this new system. We show that substantial performance gains can be obtained by using the HeMM system. 相似文献

11.

萤火虫2：一种多态并行机的硬件体系结构

李涛杨婷易学渊蒲林钱博文黄光新黄虎才韩俊刚《计算机工程与科学》2014,36(2):191-200

提出了一种新型的多态高效并行阵列机结构--萤火虫2号阵列机。该结构的处理单元可以在SIMD和MIMD两种模式下运行,兼有异步执行机制,还可以实现分布式指令级并行处理。采用了硬件的多线程管理器和高效通信机制,这些机制使得此种阵列机能够实现效率很高的线程级并行运算、数据级并行运算和分布式指令级并行运算。尤其值得指出的是,此种阵列机的流处理性能堪与专用集成电路匹敌。该结构还能有效实现静态与动态数据流计算,可以高效实现图形、图像和数字信号处理任务。相似文献

12.

Task Scheduling on the PASM Parallel Processing System

《IEEE transactions on pattern analysis and machine intelligence》1985,(2):145-157

PASM is a proposed large-scale distributed/parallel processing system which can be partitioned into independent SIMD/MIMD machines of various sizes. One design problem for systems such as PASM is task scheduling. The use of multiple FIFO queues for nonpreemptive task scheduling is described. Four multiple-queue scheduling algorithms with different placement policies are presented and applied to the PASM parallel processing system. Simulation of a queueing network model is used to compare the performance of the algorithms. Their performance is also considered in the case where there are faulty control units and processors. The multiple-queue scheduling algorithms can be adapted for inclusion in other multiple-SIMD and partitionable SIMD/MIMD systems that use similar types of interconnection networks to those being considered for PASM. 相似文献

13.

Experimental application-driven architecture analysis of anSIMD/MIMD parallel processing system

Bronson E.C. Casavant T.L. Jamieson L.H. 《Parallel and Distributed Systems, IEEE Transactions on》1990,1(2):195-205

An experimental analysis of the architecture of an SIMD/MIMD parallel processing system is presented. Detailed implementations of parallel fast Fourier transform (FFT) programs were used to examine the performance of the prototype of the PASM (Partitionable SIMD/MIMD) parallel processing system. Detailed execution-time measurements using specialized timing hardware were made for the complete FFT and for components of SIMD, MIMD, and barrier-synchronized MIMD implementations. The component measurements isolated the effects of floating-point arithmetic operations, interconnection network transfer operations, and program control overhead. The measurements allow an accurate extrapolation of the execution time, speedup, and efficiency of the MIMD, SIMD, and barrier-synchronized MIMD programs to a full 1024-processor PASM system. This constitutes one of the first results of this kind, in which controlled experiments on fixed hardware were used to make comparisons of these fundamental modes of computing. Overall, the experimental results demonstrate the value of mixed-mode SIMD/MIMD computing and its suitability for computational intensive algorithms such as the FET 相似文献

14.

Static scheduling for barrier MIMD architectures

Henry G. Dietz Abderrazek Zaafrani Matthew T. O'Keefe 《The Journal of supercomputing》1992,5(4):263-289

In a SIMD or VLIW machine, conceptual synchronizations are accomplished by using a static code schedule that does not require run-time synchronization. The lack of run-time synchronization overhead makes these machines very effective for fine-grain parallelism, but they cannot execute parallel code structures as general as those executed by MIMD architectures, and this limits their utility.In this paper we present a timing analysis that allows a compiler for a MIMD machine to eliminate a large fraction of the run-time synchronization by making efficient use of static code scheduling. Although these techniques can be adapted to be applied to most MIMD machines, this paper centers on the analysis and scheduling for barrier MIMD machines. Barrier MIMDs are asynchronous multiple instruction stream/multiple data stream architectures capable of parallel execution of variable execution-time instructions and arbitrary control flow (e.g., while loops and calls). However, they also incorporate a special hardware barrier synchronization mechanism that facilitates static scheduling by providing a mechanism which the compiler can use to enforce precise timing constraints. In other words, the compiler tracks relative timing between processors and uses static code scheduling until the timing imprecision becomes too large, at which point the compiler simply inserts a barrier to reduce that timing imprecision to zero (or a small constant).This paper describes new scheduling and barrier placement algorithms for barrier MIMDs that are based loosely on the list scheduling approach employed for VLIWs [Ellis 1985]. In addition, the experimental results from scheduling thousands of synthetic benchmark programs for a parameterized barrier MIMD machine are presented. 相似文献

15.

Parallel Implementations of Block-Based Motion Vector Estimation for Video Compression on Four Parallel Processing Systems

Min Tan Janet M. Siegel Howard Jay Siegel 《International journal of parallel programming》1999,27(3):195-225

Parallel algorithms, based on a distributed memory machine model, for an exhaustive search technique for motion vector estimation in video compression are being designed and evaluated. Results from the execution on a 16,384 processor MasPar MP-1 (an SIMD machine), a 140 node Intel Paragon XP/S and a 16 node IBM SP2 (two M IMD machines), and the 16 processor PASM prototype (a partitionable SIMD/MIMD mixed-mode machine) are presented. The trade-offs of using different modes of parallelism (SIMD, SPMD, and mixed-mode) and different data partitioning schemes (the rectangular and stripe subimage methods) are examined. The analytical and experimental results shown in this application study will help practitioners to predict and contrast the performance of different approaches to parallel implementation of this important video compression technique. The results presented are also applicable to a large class of image and video processing tasks. Case studies, such as the one presented here, are a necessary step in developing software tools for mapping an application task onto a single parallel machine and for mapping a set of independent application tasks, or the subtasks of a single application task, onto a heterogeneous suite of parallel machines. 相似文献

16.

Boundary element analysis on vector and parallel computers

《Computing Systems in Engineering》1994,5(3):239-252

Boundary element analysis (BEA) can be characterized as a numerical technique that generally shifts the computational burden in the analysis toward numerical integration and the solution of nonsymmetric and either dense or blocked sparse systems of algebraic equations. Researchers have explored the concept that the fundamental characteristics of BEA can be exploited to generate effective implementations on vector and parallel computers. In this paper, the results of some of these investigations are discussed. The performance of overall algorithms for BEA on vector supercomputers, massively data parallel single instruction multiple data (SIMD), and relatively fine grained distributed memory multiple instruction multiple data (MIMD) computer systems is described. Some general trends and conclusions are discussed, along with indications of future developments that may prove fruitful in this regard. 相似文献

17.

A parallel algorithm for graph matching and its MasParimplementation

Allen R. Cinque L. Tanimoto S. Shapiro L. Yasuda D. 《Parallel and Distributed Systems, IEEE Transactions on》1997,8(5):490-501

Search of discrete spaces is important in combinatorial optimization. Such problems arise in artificial intelligence, computer vision, operations research, and other areas. For realistic problems, the search spaces to be processed are usually huge, necessitating long computation times, pruning heuristics, or massively parallel processing. We present an algorithm that reduces the computation time for graph matching by employing both branch-and-bound pruning of the search tree and massively-parallel search of the as-yet-unpruned portions of the space. Most research on parallel search has assumed that a multiple-instruction-stream/multiple-data-stream (MIMD) parallel computer is available. Since massively parallel stream (SIMD) computers are much less expensive than MIMD systems with equal numbers of processors, the question arises as to whether SIMD systems can efficiently handle state-space search problems. We demonstrate that the answer is yes, and in particular, that graph matching has a natural and efficient implementation on SIMD machines 相似文献

18.

OpenVX与三维渲染在多态GPU上的并行实现

延酉玫李涛王鹏博韩俊刚李雪丹姚静乔虹《计算机应用》2015,35(1):53-57

针对图像处理与机器视觉以及三维图形渲染等所具有的大规模并行处理特征,通过充分利用面向图形图像处理的多态阵列架构(PAAG)处理器的可编程性以及灵活的并行处理方式,采用操作级并行与数据级并行相结合的并行化设计方法,实现了OpenVX中Kernel函数以及3D图形渲染.实验结果表明,在OpenVX标准图像处理Kernel函数以及图形渲染的并行实现中,采用PAAG处理器中的多指令多数据(MIMD)并行处理方式可以获得斜率为1的线性加速比,比传统图形处理器(GPU)中单指令多数据(SIMD)并行处理方式所得到的斜率值小于1的非线性加速比效率更高. 相似文献

19.

A survey of parallel computer architectures 总被引：1，自引：0，他引：1

Duncan R. 《Computer》1990,23(2):5-16

An attempt is made to place recent architectural innovations in the broader context of parallel architecture development by surveying the fundamentals of both newer and more established parallel computer architectures and by placing these architectural alternatives in a coherent framework. The primary emphasis is on architectural constructs rather than specific parallel machines. Three categories of architecture are defined and discussed: synchronous architectures, comprising vector, SIMD (single-instruction-stream, multiple-data-stream) and systolic machines; MIMD (multiple-instruction-stream, multiple-data-stream) with either distributed or shared memory; and MIMD-based paradigms, comprising MIMD/SIMD hybrid, dataflow, reduction, and wavefront types 相似文献

20.

Image understanding architecture: exploiting potential parallelismin machine vision

Weems C.C. Riseman E.M. Hanson A.R. 《Computer》1992,25(2):65-68

A hardware architecture that addresses at least part of the potential parallelism in each of the three levels of vision abstraction, low (sensory), intermediate (symbolic), and high (knowledge-based), is described. The machine, called the image understanding architecture (IUA), consists of three different, tightly coupled parallel processors; the content addressable array parallel processor (CAAPP) at the low level, the intermediate communication associative processor (ICAP) at the intermediate level, and the symbolic processing array (SPA) at the high level. The CAAPP and ICAP levels are controlled by an array control unit (ACU) that takes its directions from the SPA level. The SPA is a multiple-instruction multiple-data (MIMD) parallel processor, while the intermediate and low levels operat in multiple modes. The CAAPP operates in single-instruction multiple-data (SIMD) associative or multiassociative mode, and the ICAP operates in single-program multiple-data (SPMD) or MIMD mode 相似文献