首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 203 毫秒
1.
数据并行模型应用到MIMD机器上,实现SPMD模式的松散同步的方式越来越受到人们的重视。文中提出了一个以屏构并行系统为环境的数据并行语言Multi-c的设计和实现。正在实现的Muliti-c编译器,以预编译的方式接受SIMD形式的程序说明,放宽同步要求,产生能以SPMK方式在并行系统上运行的C程序。  相似文献   

2.
One of the challenges for parallel compilers and compiler-related tools is, given a machine-independent parallel language, to generate executable code for a variety of computational models, and to identify those specific parallel modes for which a program is well-suited. One portion of this problem, developing a method for estimating the relative execution time of a data-parallel algorithm in an environment capable of the SIMD and SPMD (MIMD) modes of parallelism, is presented. Given a data-parallel program in a language whose syntax is mode-independent and empirical information about instruction execution time characteristics, the goal is to use static source-code analysis to determine an implementation that results in an optimal execution time for a mixed-mode machine capable of SIMD and SPMD parallelism. Statistical information about individual operation execution times and paths of execution through a parallel program is assumed. A secondary goal of this study is to indicate language, algorithm, and machine characteristics that must be researched to learn how to provide the information needed to obtain an optimal assignment of parallel modes to program segments.  相似文献   

3.
Parallel algorithms, based on a distributed memory machine model, for an exhaustive search technique for motion vector estimation in video compression are being designed and evaluated. Results from the execution on a 16,384 processor MasPar MP-1 (an SIMD machine), a 140 node Intel Paragon XP/S and a 16 node IBM SP2 (two M IMD machines), and the 16 processor PASM prototype (a partitionable SIMD/MIMD mixed-mode machine) are presented. The trade-offs of using different modes of parallelism (SIMD, SPMD, and mixed-mode) and different data partitioning schemes (the rectangular and stripe subimage methods) are examined. The analytical and experimental results shown in this application study will help practitioners to predict and contrast the performance of different approaches to parallel implementation of this important video compression technique. The results presented are also applicable to a large class of image and video processing tasks. Case studies, such as the one presented here, are a necessary step in developing software tools for mapping an application task onto a single parallel machine and for mapping a set of independent application tasks, or the subtasks of a single application task, onto a heterogeneous suite of parallel machines.  相似文献   

4.
Functionally reconfigurable general purpose parallel machines (FRPM) could be reconfigured during the operation from SIMD to MIMD mode or vice versa (first aspect) and from one interconnection network to another according to the data storing order (second aspect). General purpose machines are considered in order to obtain an arbitrary data exchange between the processing elements they are built of. A model for describing such interconnection networks is presented. A full-information exchange network in introduced which is reconfigurable in a programming way to tree-, matrix-, cube-, linear-neighbourhood and FFT-network. Some schemes for constructing SIMD/MIMD reconfigurable machines are given. The usefullness of using FRMP for image processing and pattern recognition is discussed.  相似文献   

5.
Weems  C.C. Riseman  E.M. Hanson  A.R. 《Computer》1992,25(2):65-68
A hardware architecture that addresses at least part of the potential parallelism in each of the three levels of vision abstraction, low (sensory), intermediate (symbolic), and high (knowledge-based), is described. The machine, called the image understanding architecture (IUA), consists of three different, tightly coupled parallel processors; the content addressable array parallel processor (CAAPP) at the low level, the intermediate communication associative processor (ICAP) at the intermediate level, and the symbolic processing array (SPA) at the high level. The CAAPP and ICAP levels are controlled by an array control unit (ACU) that takes its directions from the SPA level. The SPA is a multiple-instruction multiple-data (MIMD) parallel processor, while the intermediate and low levels operat in multiple modes. The CAAPP operates in single-instruction multiple-data (SIMD) associative or multiassociative mode, and the ICAP operates in single-program multiple-data (SPMD) or MIMD mode  相似文献   

6.
Parallel processing is an area of growing interest to the computer science and engineering communities. This paper is an introduction to some of the concepts involved in the design and use of large-scale parallel systems. Parallel machines that are classified as SIMD (synchronous) and MIMD (asynchronous) systems, composed of a large number of microprocessors, are explored. Parallel algorithms are examined, using image smoothing, recursive doubling and contour tracing as examples. Single stage and multistage networks are discussed. The single stage Cube, PM21, Four Nearest Neighbor and Shuffle-Exchange networks are presented, and the multistage Cube network is described. Case studies of three microprocessor-based systems are given as examples of parallel machine designs, specifically the MPP SIMD machine, the Ultracomputer MIMD system, and the PASM SIMD/MIMD machine.  相似文献   

7.
Real-time image analysis requires the use of massively parallel machines. Conventional parallel machines consist of an array of identical processors organized in either single instruction multiple data (SIMD) or multiple instruction multiple data (MIMD) configurations. Machines of this type generally only operate effectively on parts of the image analysis problem. SIMD on the low level processing and MIMD on the high level processing. In this paper we describe the Warwick Pyramid Machine, an architecture consisting of both SIMD and MIMD parts in a multiple-SIMD (MSIMD) organization which can operate effectively at all levels of the image analysis problem.  相似文献   

8.
Data-parallel implementations of the computationally intensive task of solving multiple quadratic forms (MQFs) have been examined. Coupled and uncoupled parallel methods are investigated, where coupling relates to the degree of interaction among the processors. Also, the impact of partitioning a large MQF problem into smaller non-interacting subtasks is studied. Trade-offs among the implementations for various data-size/machine-size ratios are categorized in terms of complex arithmetic operation counts, communication overhead, and memory storage requirements. Furthermore, the impact on performance of the mode of parallelism used is considered, specifically, SIMD versus MIMD versus SIMD/MIMD mixed-mode. From the complexity analyses, it is shown that none of the algorithms presented in this paper is best for all data-size/machine-size ratios. Thus, to achieve scalability (i.e., good performance as the number of processors available in a machine increases), instead of using a single algorithm, the approach discussed is to have a set of algorithms from which the most appropriate algorithm or combination of algorithms is selected based on the ratio calculated from the scaled machine size. The analytical results have been verified by experiments on the MasPar MP-1 (SIMD), nCUBE 2 (MIMD), and PASM (mixed-mode) prototype.  相似文献   

9.
Performance of a parallel algorithm on a parallel machine depends not only on the time complexity of the algorithm, but also on how the underlying machine supports the fundamental operations used by the algorithm. This study analyzes various mappings of image correlation algorithms in SIMD, MIMD, and mixed-mode environments. Experiments were conducted on the Intel Paragon, MasPar MP-1, nCUBE 2, and PASM prototype. The machine features considered in this study include: modes of parallelism, communication/computation ratio, network topology and implementation, SIMD CU/PE overlap, and communication/computation overlap. Performance of an implementation can be enhanced by using algorithmic techniques that match the machine features. Some algorithmic techniques discussed here are additional communication versus redundant computation, data block transfers, and communication/computation overlap. The results presented are applicable to a large class of image processing tasks. Case studies, such as the one presented here, are a necessary step in developing software tools for mapping an application task onto a single parallel machine and for mapping the subtasks of an application task, or a set of independent application tasks, onto a heterogeneous suite of parallel machines.  相似文献   

10.
Efficient data layout is an important aspect of the compilation process. A model for the creation of perfect memory maps for large-scale parallel machines capable of user-controlled partitionable single-instruction-multiple data/single-program-multiple data (SIMD/SPMD) operation is developed. The term perfect implies that no memory fragmentation occurs and ensures that the memory map size is kept to a minimum. A major constraint on solving this problem is based on the single program nature of both the SIMD and SPMD modes of parallelism. It is assumed that all processors within the same submachine used identical addresses to access corresponding data items in each of their local memories. Necessary and sufficient conditions are derived for being able to create perfect memory maps, and results are applied to several partitionable interconnection networks  相似文献   

11.
The data parallel meta language (DPML) and its associated Fortran source code rewriter (DP77) support architecture independent, high performance climate and weather prediction models. The language allows the data domains over which a program operates, the communication patterns required between elements of those data domains, and some or all of the calculations of a program to be expressed at a very high level. DPML uses explicit data parallelism to express the inherent parallelism of the models, with the result that programs are easily compilable into target machine code. DP77 uses information from the DPML program to translate Fortran routines into the host specific Fortran form required for their parallel execution within the model. This paper describes the general strategy behind the development of DPML, discusses its language features using examples drawn from climate modelling, and provides details of the mechanism it uses for incorporating Fortran into data parallel programs. Encouraging results are reported for DPML versions of the standard weather benchmark models executing on vector, SIMD, and MIMD (shared memory) machines. While the paper is set within the framework of climate modelling, the technique has obvious wider implications.  相似文献   

12.
A survey of parallel computer architectures   总被引:1,自引:0,他引:1  
Duncan  R. 《Computer》1990,23(2):5-16
An attempt is made to place recent architectural innovations in the broader context of parallel architecture development by surveying the fundamentals of both newer and more established parallel computer architectures and by placing these architectural alternatives in a coherent framework. The primary emphasis is on architectural constructs rather than specific parallel machines. Three categories of architecture are defined and discussed: synchronous architectures, comprising vector, SIMD (single-instruction-stream, multiple-data-stream) and systolic machines; MIMD (multiple-instruction-stream, multiple-data-stream) with either distributed or shared memory; and MIMD-based paradigms, comprising MIMD/SIMD hybrid, dataflow, reduction, and wavefront types  相似文献   

13.
An experimental analysis of the architecture of an SIMD/MIMD parallel processing system is presented. Detailed implementations of parallel fast Fourier transform (FFT) programs were used to examine the performance of the prototype of the PASM (Partitionable SIMD/MIMD) parallel processing system. Detailed execution-time measurements using specialized timing hardware were made for the complete FFT and for components of SIMD, MIMD, and barrier-synchronized MIMD implementations. The component measurements isolated the effects of floating-point arithmetic operations, interconnection network transfer operations, and program control overhead. The measurements allow an accurate extrapolation of the execution time, speedup, and efficiency of the MIMD, SIMD, and barrier-synchronized MIMD programs to a full 1024-processor PASM system. This constitutes one of the first results of this kind, in which controlled experiments on fixed hardware were used to make comparisons of these fundamental modes of computing. Overall, the experimental results demonstrate the value of mixed-mode SIMD/MIMD computing and its suitability for computational intensive algorithms such as the FET  相似文献   

14.
PASM is a proposed large-scale distributed/parallel processing system which can be partitioned into independent SIMD/MIMD machines of various sizes. One design problem for systems such as PASM is task scheduling. The use of multiple FIFO queues for nonpreemptive task scheduling is described. Four multiple-queue scheduling algorithms with different placement policies are presented and applied to the PASM parallel processing system. Simulation of a queueing network model is used to compare the performance of the algorithms. Their performance is also considered in the case where there are faulty control units and processors. The multiple-queue scheduling algorithms can be adapted for inclusion in other multiple-SIMD and partitionable SIMD/MIMD systems that use similar types of interconnection networks to those being considered for PASM.  相似文献   

15.
The paper presents a parallel programming methodology that ensures easy programming, efficiency and portability of programs to different machines belonging to the class of the general-purpose, distributed-memory, MIMD architectures. The methodology is based on the definition of a new, high-level, explicitly parallel language, called P3 L, and of a set of static tools that automatically adapt the program features for each target architecture. P3 L does not require programmers to specify process activations, the actual parallelism degree, scheduling, or interprocess communications, i.e. all those features that need to be adjusted to harness each specific target machine. Parallelism is, on the other hand, expressed in a structured and qualitative way, by hierarchical composition of a restricted set of language constructs, corresponding to those forms of parallelism that are frequently encountered in parallel applications, and that can be efficiently implemented. The efficient portability of P3 L applications is guaranteed by the compiler along with the novel structure of the support. The compiler automatically adapts the program features for each specific architecture, using the costs (in terms of performance) of the low-level mechanisms exported by the architecture itself. In our methodology, these costs, along with other features of the architecture, are viewed through an abstract machine, whose interface is used by the compiler to produce the final object code.  相似文献   

16.
In a SIMD or VLIW machine, conceptual synchronizations are accomplished by using a static code schedule that does not require run-time synchronization. The lack of run-time synchronization overhead makes these machines very effective for fine-grain parallelism, but they cannot execute parallel code structures as general as those executed by MIMD architectures, and this limits their utility.In this paper we present a timing analysis that allows a compiler for a MIMD machine to eliminate a large fraction of the run-time synchronization by making efficient use of static code scheduling. Although these techniques can be adapted to be applied to most MIMD machines, this paper centers on the analysis and scheduling for barrier MIMD machines. Barrier MIMDs are asynchronous multiple instruction stream/multiple data stream architectures capable of parallel execution of variable execution-time instructions and arbitrary control flow (e.g., while loops and calls). However, they also incorporate a special hardware barrier synchronization mechanism that facilitates static scheduling by providing a mechanism which the compiler can use to enforce precise timing constraints. In other words, the compiler tracks relative timing between processors and uses static code scheduling until the timing imprecision becomes too large, at which point the compiler simply inserts a barrier to reduce that timing imprecision to zero (or a small constant).This paper describes new scheduling and barrier placement algorithms for barrier MIMDs that are based loosely on the list scheduling approach employed for VLIWs [Ellis 1985]. In addition, the experimental results from scheduling thousands of synthetic benchmark programs for a parameterized barrier MIMD machine are presented.  相似文献   

17.
This paper examines measures for evaluating the performance of algorithms for single instruction stream–multiple data stream (SIMD) machines. The SIMD mode of parallelism involves using a large number of processors synchronized together. All processors execute the same instruction at the same time; however, each processor operates on a different data item. The complexity of parallel algorithms is, in general, a function of the machine size (number of processors), problem size, and type of interconnection network used to provide communications among the processors. Measures which quantify the effect of changing the machine-size/problem-size/network-type relationships are therefore needed. A number of such measures are presented and are applied to an example SIMD algorithm from the image processing problem domain. The measures discussed and compared include execution time, speed, parallel efficiency, overhead ratio, processor utilization, redundancy, cost effectiveness, speed-up of the parallel algorithm over the corresponding serial algorithm, and an additive measure called "sprice" which assigns a weighted value to computations and processors.  相似文献   

18.
针对图像处理与机器视觉以及三维图形渲染等所具有的大规模并行处理特征,通过充分利用面向图形图像处理的多态阵列架构(PAAG)处理器的可编程性以及灵活的并行处理方式,采用操作级并行与数据级并行相结合的并行化设计方法,实现了OpenVX中Kernel函数以及3D图形渲染.实验结果表明,在OpenVX标准图像处理Kernel函数以及图形渲染的并行实现中,采用PAAG处理器中的多指令多数据(MIMD)并行处理方式可以获得斜率为1的线性加速比,比传统图形处理器(GPU)中单指令多数据(SIMD)并行处理方式所得到的斜率值小于1的非线性加速比效率更高.  相似文献   

19.
This paper examines the applicability of fine-grained tree-structured SIMD machines, which are amenable to highly efficient VLSI implementation, to several low-level image understanding tasks. Algorithms are presented for histogramming, thresholding, image correlation, connected component labeling, and computing Euler number. A particular massively parallel machine called NON-VON is used for purposes of explication and performance evaluation. Only NON-VON tree-structured communication capabilities and its SIMD mode of execution are considered in this paper. Novel algorithmic techniques are described, such as vertical pipelining, subproblem partitioning, associative matching, and data duplication, that effectively exploit the massive parallelism available in fine-grained SIMD tree machines while avoiding communication bottlenecks. Simulation results are presented and compared with results obtained or forecast for other highly parallel machines. The relative advantages and limitations of the class of machines under consideration are outlined; except for some types of image correlation, the fine-grained SIMD tree is exceptionally fast.  相似文献   

20.
The Connection Machine CM-5 supports both SIMD and MIMD programming modes. The SIMD mode is simulated over the MIMD mode. This simulation is likely to lead to a loss of performance in SIMD programs. This paper describes a comparison of the two programming modes with CM Fortran and message-passing Fortran. Two kinds of benchmarks are discussed. The first kind consists of synthetic benchmarks in which we measure time for basic arithmetic operations and communication time. The second kind consists of application benchmarks. The experimental results conclusively show that message-passing Fortran performs considerably better than CM Fortran. While the CM-5 is obsolete, these issues show up in the T3D and other current machines.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号