首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Features of an explicitly parallel programming language targeted for reconfigurable parallel processing systems, where the machine's N processing elements (PEs) are capable of operating in both the SIMD and SPMD modes of parallelism, are described. The SPMD (single program-multiple data) mode of parallelism is a subset of the MIMD mode where all processors execute the same program. By providing all aspects of the language with an SIMD mode version and an SPMD mode version that are syntactically and semantically equivalent, the language facilitates experimentation with and exploitation of hybrid SIMD/SPMD machines. Language constructs (and their implementations) for data management, data-dependent control-flow, and PE-address-dependent control-flow are presented. These constructs are based on experience gained from programming a parallel machine prototype and are being incorporated into a compiler under development. Much of the research presented is applicable to general SIMD machines and MIMD machines  相似文献   

2.
This paper presents bitonic sorting schemes for special-purpose parallel architectures such as sorting networks and for general-purpose parallel architectures such as SIMD and/or MIMD computers. First, bitonic sorting algorithms for shared-memory SIMD and/or MIMD computers are developed. Shared-memory accesses through the interconnection network of shared memory SIMD and/or MIMD computers can be very time consuming. A scheme is introduced which reduces the number of such accesses. This scheme is based on the parity strategy which is the main idea of the paper. By reducing the communication through the network, a performance improvement is achieved. Second, a recirculating bitonic sorting network is presented, which is composed of one level of N/2 comparators plus an Ω-network of (log N-1) switch levels. This network reduces the cost complexity to O(N log N) compared with the O(N log2 N) of the original bitonic sorting network, while preserving the same time complexity. Finally, a simplified multistage bitonic sorting network, is presented. For simplifying the interlevel wiring, the parity strategy is used, so N/2 keys are wired straight through the network  相似文献   

3.
Parallel algorithms, based on a distributed memory machine model, for an exhaustive search technique for motion vector estimation in video compression are being designed and evaluated. Results from the execution on a 16,384 processor MasPar MP-1 (an SIMD machine), a 140 node Intel Paragon XP/S and a 16 node IBM SP2 (two M IMD machines), and the 16 processor PASM prototype (a partitionable SIMD/MIMD mixed-mode machine) are presented. The trade-offs of using different modes of parallelism (SIMD, SPMD, and mixed-mode) and different data partitioning schemes (the rectangular and stripe subimage methods) are examined. The analytical and experimental results shown in this application study will help practitioners to predict and contrast the performance of different approaches to parallel implementation of this important video compression technique. The results presented are also applicable to a large class of image and video processing tasks. Case studies, such as the one presented here, are a necessary step in developing software tools for mapping an application task onto a single parallel machine and for mapping a set of independent application tasks, or the subtasks of a single application task, onto a heterogeneous suite of parallel machines.  相似文献   

4.
PASM is a proposed large-scale distributed/parallel processing system which can be partitioned into independent SIMD/MIMD machines of various sizes. One design problem for systems such as PASM is task scheduling. The use of multiple FIFO queues for nonpreemptive task scheduling is described. Four multiple-queue scheduling algorithms with different placement policies are presented and applied to the PASM parallel processing system. Simulation of a queueing network model is used to compare the performance of the algorithms. Their performance is also considered in the case where there are faulty control units and processors. The multiple-queue scheduling algorithms can be adapted for inclusion in other multiple-SIMD and partitionable SIMD/MIMD systems that use similar types of interconnection networks to those being considered for PASM.  相似文献   

5.
Search of discrete spaces is important in combinatorial optimization. Such problems arise in artificial intelligence, computer vision, operations research, and other areas. For realistic problems, the search spaces to be processed are usually huge, necessitating long computation times, pruning heuristics, or massively parallel processing. We present an algorithm that reduces the computation time for graph matching by employing both branch-and-bound pruning of the search tree and massively-parallel search of the as-yet-unpruned portions of the space. Most research on parallel search has assumed that a multiple-instruction-stream/multiple-data-stream (MIMD) parallel computer is available. Since massively parallel stream (SIMD) computers are much less expensive than MIMD systems with equal numbers of processors, the question arises as to whether SIMD systems can efficiently handle state-space search problems. We demonstrate that the answer is yes, and in particular, that graph matching has a natural and efficient implementation on SIMD machines  相似文献   

6.
Image processing problems frequently involve large structured arrays of data and a need for very rapid computation. Special parallel processing schemes have evolved over the last 20 years to deal with these problems. In this paper many parallel systems which have been developed for image processing are outlined and the features of their underlying architectures are discussed. Most of these special architectures may be loosely classified as either SIMD or pipeline structures although some MIMD structures have been designed for high level image analysis. In recent years several multiple SIMD (MSIMD) schemes have been proposed as suitable architectures for image processing. The fundamental problems of developing an effective MSIMD system are discussed and a simple SIMD/MIMD computational model for comparison with such systems is proposed.  相似文献   

7.
An experimental analysis of the architecture of an SIMD/MIMD parallel processing system is presented. Detailed implementations of parallel fast Fourier transform (FFT) programs were used to examine the performance of the prototype of the PASM (Partitionable SIMD/MIMD) parallel processing system. Detailed execution-time measurements using specialized timing hardware were made for the complete FFT and for components of SIMD, MIMD, and barrier-synchronized MIMD implementations. The component measurements isolated the effects of floating-point arithmetic operations, interconnection network transfer operations, and program control overhead. The measurements allow an accurate extrapolation of the execution time, speedup, and efficiency of the MIMD, SIMD, and barrier-synchronized MIMD programs to a full 1024-processor PASM system. This constitutes one of the first results of this kind, in which controlled experiments on fixed hardware were used to make comparisons of these fundamental modes of computing. Overall, the experimental results demonstrate the value of mixed-mode SIMD/MIMD computing and its suitability for computational intensive algorithms such as the FET  相似文献   

8.
In this paper we discuss the implementation of an ADI method for solving the diffusion equation on three parallel/vector computers. The computers were chosen so as to encompass a variety of architectures. They are the MPP, an SIMD machine with 16-Kbit serial processors; Flex/32, an MIMD machine with 20 processors; and Cray/2, an MIMD machine with four vector processors. The Gaussian elimination algorithm is used to solve a set of tridiagonal systems on the Flex/32 and Cray/2 while the cyclic elimination algorithm is used to solve these systems on the MPP. The implementation of the method is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally conclusions are presented.  相似文献   

9.
Functionally reconfigurable general purpose parallel machines (FRPM) could be reconfigured during the operation from SIMD to MIMD mode or vice versa (first aspect) and from one interconnection network to another according to the data storing order (second aspect). General purpose machines are considered in order to obtain an arbitrary data exchange between the processing elements they are built of. A model for describing such interconnection networks is presented. A full-information exchange network in introduced which is reconfigurable in a programming way to tree-, matrix-, cube-, linear-neighbourhood and FFT-network. Some schemes for constructing SIMD/MIMD reconfigurable machines are given. The usefullness of using FRMP for image processing and pattern recognition is discussed.  相似文献   

10.
Data-parallel implementations of the computationally intensive task of solving multiple quadratic forms (MQFs) have been examined. Coupled and uncoupled parallel methods are investigated, where coupling relates to the degree of interaction among the processors. Also, the impact of partitioning a large MQF problem into smaller non-interacting subtasks is studied. Trade-offs among the implementations for various data-size/machine-size ratios are categorized in terms of complex arithmetic operation counts, communication overhead, and memory storage requirements. Furthermore, the impact on performance of the mode of parallelism used is considered, specifically, SIMD versus MIMD versus SIMD/MIMD mixed-mode. From the complexity analyses, it is shown that none of the algorithms presented in this paper is best for all data-size/machine-size ratios. Thus, to achieve scalability (i.e., good performance as the number of processors available in a machine increases), instead of using a single algorithm, the approach discussed is to have a set of algorithms from which the most appropriate algorithm or combination of algorithms is selected based on the ratio calculated from the scaled machine size. The analytical results have been verified by experiments on the MasPar MP-1 (SIMD), nCUBE 2 (MIMD), and PASM (mixed-mode) prototype.  相似文献   

11.
The performance of conjugate gradient (CG) algorithms for the solution of the system of linear equations that results from the finite-differencing of the neutron diffusion equation was analyzed on SIMD, MIMD, and mixed-mode parallel machines. A block preconditioner based on the incomplete Cholesky factorization was used to accelerate the conjugate gradient search. The issues involved in mapping both the unpreconditioned and preconditioned conjugate gradient algorithms onto the mixed-mode PASM prototype, the SIMD MasPar MP-1, and the MIMD Intel Paragon XP/S are discussed. On PASM , the mixed-mode implementation outperformed either SIMD or MIMD alone. Theoretical performance predictions were analyzed and compared with the experimental results on the MasPar MP-1 and the Paragon XP/S. Other issues addressed include the impact on execution time of the number of processors used, the effect of the interprocessor communication network on performance, and the relationship of the number of processors to the quality of the preconditioning. Applications studies such as this are necessary in the development of software tools for mapping algorithms onto either a single parallel machine or a heterogeneous suite of parallel machines.  相似文献   

12.
《Data Processing》1986,28(8):405-409
Parallel processing can reduce the cost of computing, improve reliability and provide greater throughput. Equally important is the likelihood that real-life problems contain parallelism that should be exploited in the computer architecture if natural solutions are to be found. Many parallel machines are now on the market, both single instruction, multiple data (SIMD) and multiple instruction, multiple data (MIMD), but software engineering techniques have yet to make their complete impact on these architectures.  相似文献   

13.
In this paper, we consider a massively parallel system that is composed of heterogeneous processors, that is, processors with different processing power, and that combines the advantages of the SIMD and MIMD architectures. The heterogeneous mixed-mode (HeMM) execution model is composed of two main components, which operate in the well-known SIMD and MIMD paradigms. The main computing power comes from a component that is composed of a massive number of processors and operates in a data parallel manner. The other component is composed of a few (or even one) fast processors which operate in the MIMD paradigm. The operation of a small number of processors in an MIMD paradigm has been well demonstrated through actual systems. The processors in this component add flexibility to the execution of the parallel programs such that it adjusts to the changing parallelism of the program to enhance the performance. Based on this execution model we analyze the gains in performance that is obtainable by this new system. We show that substantial performance gains can be obtained by using the HeMM system.  相似文献   

14.
Real-time image analysis requires the use of massively parallel machines. Conventional parallel machines consist of an array of identical processors organized in either single instruction multiple data (SIMD) or multiple instruction multiple data (MIMD) configurations. Machines of this type generally only operate effectively on parts of the image analysis problem. SIMD on the low level processing and MIMD on the high level processing. In this paper we describe the Warwick Pyramid Machine, an architecture consisting of both SIMD and MIMD parts in a multiple-SIMD (MSIMD) organization which can operate effectively at all levels of the image analysis problem.  相似文献   

15.
《Parallel Computing》1999,25(13-14):2111-2134
The study of Quantum Chromodynamics (QCD) remains one of the most challenging topics in elementary particle physics. The lattice formulation of QCD, in which space–time is treated as a four-dimensional hypercubic grid of points, provides the means for a numerical solution from first principles but makes extreme demands upon computational performance. High Performance Computing (HPC) offers us the tantalising prospect of a verification of QCD through the precise reproduction of the known masses of the strongly interacting particles. It is also leading to the development of a phenomenological tool capable of disentangling strong interaction effects from weak interaction effects in the decays of one kind of quark into another, crucial for determining parameters of the Standard Model of particle physics.The 1980s saw the first attempts to apply parallel architecture computers to lattice QCD. The SIMD and MIMD machines used in these pioneering efforts were the ICL DAP and the Cosmic Cube, respectively. These were followed by the Connection Machine, the Meiko i860 Computing Surface and the Intel Hypercube. The end of the decade witnessed a rise in the development of special purpose dedicated parallel systems, notably the APE machines in Rome, the Columbia machines, the GF-11 system at IBM Research and the QCDPAX project in Tsukuba. The state-of-the-art is represented by the CP–PACS machine at Tsukuba, and QCDSP, the latest Columbia machine.We give a brief pedagogic review of lattice QCD, outline the computational methodology used and discuss the sources of systematic error that arise in numerical calculations. We outline some of the early calculations and discuss parallel architectures and their application to QCD, giving examples of both commercial and special purpose machines. After a short section on recent developments, we describe state-of-the-art machines and conclude with the prospects for the future.  相似文献   

16.
Networks based on the High Performance Parallel Interface (HIPPI) will become the norm at Los Alamos National Laboratory. The ramification of such a high-speed networking paradigm on scientific visualization are enormous. Not only will scientists have the capability of networked framebuffer animation loops in their offices, but also the partitioning of graphics tasks between MIMD, SIMD, and specialized hardware will be feasible. Of course, as bandwidth increases, the problem size quickly grows to exceed whatever the limits are. For this reason, the investigation of gigabyte networks is currently underway at Los Alamos National Laboratory.  相似文献   

17.
There has recently been a tremendous rebirth of interest in neural networks, ranging from distributed and localist spreading-activation networks to semantic networks with symbolic marker-passing. Ideally these networks would be encoded in dedicated massively-parallel hardware that directly implements their functionality. Cost and flexibility concerns, however, necessitate the use of general-purpose machines the simulate neural networks, especially in the research stages in which various models are being explored and tested. Issues of a simulation's timing and control become more critical when models are made up of heterogeneous networks in which nodes have different processing characteristics and cycling rates or which are made up of modular, interacting sub-networks. We have developed a simulation environment to create, operate, and control these types of connectionist networks. This paper describes how massively-parallel heterogeneous networks are simulated on serial machines as efficiently as possible, how large-scale simulations could be handled on current SIMD parallel machines, and outlines how the simulator could be implemented on its ideal hardware, a large-scale MIMD parallel machine.  相似文献   

18.
Parallel algorithms on SIMD (single-instruction stream multiple-data stream) machines for hierarchical clustering and cluster validity computation are proposed. The machine model uses a parallel memory system and an alignment network to facilitate parallel access to both pattern matrix and proximity matrix. For a problem with N patterns, the number of memory accesses is reduced from O(N 3) on a sequential machine to O(N2) on an SIMD machine with N PEs  相似文献   

19.
In this paper we use the event synchronization scheme to develop a new method for parallel simulation of many discrete event dynamic systems simultaneously. Though a few parallel simulation methods have been developed during the last several years, such as the well-known Standard Clock method, most of them are largely limited to Markovian systems. The main advantage of our method is its applicability to non-Markovian systems. For Markovian systems a comparison study on efficiency between our method and the Standard Clock method is done on Connection Machine CM-5. CM-5 is a parallel machine with both SIMD (Single Instruction, Multiple Data) and MIMD (Multiple Instruction Multiple Data) architectures. The simulation results show that if event rates of Markovian systems do not differ by much then both methods are compatible but the Standard Clock method performs better in most cases. For Markovian systems with very different event rates, our method often yields better results. Most importantly, our simulation results also show that our method works as efficiently for non-Markovian systems as for Markovian systems.  相似文献   

20.
The interconnection network in large-scale parallel/distributed supercomputer systems is a crucial component. Three networks are overviewed here. Multistage cube networks represent an important family of networks, which includes the omega, n-cube, multistage shuffle-exchange, delta, baseline, SW-banyan, and Generalized Cube. This family has been used or proposed for use in such systems as staran, pasm, Ultracomputer, the BBN Butterfly, the IBM RP3, and data-flow machines. The multistage cube topology, distributed routing control, and ability to be partitioned into independent subnetworks are examined. The Extra Stage Cube (ESC), a single-fault-tolerant multistage cube network, is described. The structure, control, and partitionability of the ESC, and how it functions when multiple faults occur, are presented. The Dynamic Redundancy (DR) network, a fault-tolerant multistage cube network that supports the incorporation of spare processors for fault tolerance, is discussed. Its structure, control, and partitionability into single-fault-tolerant subnetworks are explained.This research was supported by the Air Force Office of Scientific Research under grant F49620-86-K-0006, the Rome Air Development Center under grant F30602-83-K-0119, and the Purdue Research Foundation David Ross Grant 1985/86 no. 0857.currently with the Supercomputing Research Center, 4380 Forbes Blvd., Lanham, MD 20706 (as of June 1, 1987).currently with Computer Science Department, University of Illinois, Urbana-Champaign, IL 61801.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号