首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Search of discrete spaces is important in combinatorial optimization. Such problems arise in artificial intelligence, computer vision, operations research, and other areas. For realistic problems, the search spaces to be processed are usually huge, necessitating long computation times, pruning heuristics, or massively parallel processing. We present an algorithm that reduces the computation time for graph matching by employing both branch-and-bound pruning of the search tree and massively-parallel search of the as-yet-unpruned portions of the space. Most research on parallel search has assumed that a multiple-instruction-stream/multiple-data-stream (MIMD) parallel computer is available. Since massively parallel stream (SIMD) computers are much less expensive than MIMD systems with equal numbers of processors, the question arises as to whether SIMD systems can efficiently handle state-space search problems. We demonstrate that the answer is yes, and in particular, that graph matching has a natural and efficient implementation on SIMD machines  相似文献   

2.
Real-time image analysis requires the use of massively parallel machines. Conventional parallel machines consist of an array of identical processors organized in either single instruction multiple data (SIMD) or multiple instruction multiple data (MIMD) configurations. Machines of this type generally only operate effectively on parts of the image analysis problem. SIMD on the low level processing and MIMD on the high level processing. In this paper we describe the Warwick Pyramid Machine, an architecture consisting of both SIMD and MIMD parts in a multiple-SIMD (MSIMD) organization which can operate effectively at all levels of the image analysis problem.  相似文献   

3.
Image processing problems frequently involve large structured arrays of data and a need for very rapid computation. Special parallel processing schemes have evolved over the last 20 years to deal with these problems. In this paper many parallel systems which have been developed for image processing are outlined and the features of their underlying architectures are discussed. Most of these special architectures may be loosely classified as either SIMD or pipeline structures although some MIMD structures have been designed for high level image analysis. In recent years several multiple SIMD (MSIMD) schemes have been proposed as suitable architectures for image processing. The fundamental problems of developing an effective MSIMD system are discussed and a simple SIMD/MIMD computational model for comparison with such systems is proposed.  相似文献   

4.
Features of an explicitly parallel programming language targeted for reconfigurable parallel processing systems, where the machine's N processing elements (PEs) are capable of operating in both the SIMD and SPMD modes of parallelism, are described. The SPMD (single program-multiple data) mode of parallelism is a subset of the MIMD mode where all processors execute the same program. By providing all aspects of the language with an SIMD mode version and an SPMD mode version that are syntactically and semantically equivalent, the language facilitates experimentation with and exploitation of hybrid SIMD/SPMD machines. Language constructs (and their implementations) for data management, data-dependent control-flow, and PE-address-dependent control-flow are presented. These constructs are based on experience gained from programming a parallel machine prototype and are being incorporated into a compiler under development. Much of the research presented is applicable to general SIMD machines and MIMD machines  相似文献   

5.
The four general methods of parallel processing are described. These are: single instruction single data stream (SISD), multiple instruction single data stream (MISD), single instruction multiple data stream (SIMD), and multiple instruction multiple data stream (MIMD). Most single computers in use are SISD machines, while most parallel processing applications use the MIMD aproach. The paper outlines and compares the four basic MIMD architectures: tightly coupled, loosely coupled, voting, and peripheral processing. The latter is one of the most practical methods using existing minicomputers. Many modern programmable intelligent I/O controllers are peripheral processors. For a computer manufacturer to remain competitive, it is concluded that he will have to include such devices in his hardware.  相似文献   

6.
An experimental analysis of the architecture of an SIMD/MIMD parallel processing system is presented. Detailed implementations of parallel fast Fourier transform (FFT) programs were used to examine the performance of the prototype of the PASM (Partitionable SIMD/MIMD) parallel processing system. Detailed execution-time measurements using specialized timing hardware were made for the complete FFT and for components of SIMD, MIMD, and barrier-synchronized MIMD implementations. The component measurements isolated the effects of floating-point arithmetic operations, interconnection network transfer operations, and program control overhead. The measurements allow an accurate extrapolation of the execution time, speedup, and efficiency of the MIMD, SIMD, and barrier-synchronized MIMD programs to a full 1024-processor PASM system. This constitutes one of the first results of this kind, in which controlled experiments on fixed hardware were used to make comparisons of these fundamental modes of computing. Overall, the experimental results demonstrate the value of mixed-mode SIMD/MIMD computing and its suitability for computational intensive algorithms such as the FET  相似文献   

7.
We present new methods for load balancing of unstructured tree computations on large-scale SIMD machines, and analyze the scalability of these and other existing schemes. An efficient formulation of tree search on an SIMD machine consists of two major components: a triggering mechanism, which determines when the search space redistribution must occur to balance the search space over processors, and a scheme to redistribute the search space. We have devised a new redistribution mechanism and a new triggering mechanism. Either of these can be used in conjunction with triggering and redistribution mechanisms developed by other researchers. We analyze the scalability of these mechanisms and verify the results experimentally. The analysis and experiments show that our new load-balancing methods are highly scalable on SIMD architectures. Their scalability is shown to he no worse than that of the best load-balancing schemes on MIMD architectures. We verify our theoretical results by implementing the 15-puzzle problem on a CM-2 SIMD parallel computer  相似文献   

8.
Functionally reconfigurable general purpose parallel machines (FRPM) could be reconfigured during the operation from SIMD to MIMD mode or vice versa (first aspect) and from one interconnection network to another according to the data storing order (second aspect). General purpose machines are considered in order to obtain an arbitrary data exchange between the processing elements they are built of. A model for describing such interconnection networks is presented. A full-information exchange network in introduced which is reconfigurable in a programming way to tree-, matrix-, cube-, linear-neighbourhood and FFT-network. Some schemes for constructing SIMD/MIMD reconfigurable machines are given. The usefullness of using FRMP for image processing and pattern recognition is discussed.  相似文献   

9.
In a SIMD or VLIW machine, conceptual synchronizations are accomplished by using a static code schedule that does not require run-time synchronization. The lack of run-time synchronization overhead makes these machines very effective for fine-grain parallelism, but they cannot execute parallel code structures as general as those executed by MIMD architectures, and this limits their utility.In this paper we present a timing analysis that allows a compiler for a MIMD machine to eliminate a large fraction of the run-time synchronization by making efficient use of static code scheduling. Although these techniques can be adapted to be applied to most MIMD machines, this paper centers on the analysis and scheduling for barrier MIMD machines. Barrier MIMDs are asynchronous multiple instruction stream/multiple data stream architectures capable of parallel execution of variable execution-time instructions and arbitrary control flow (e.g., while loops and calls). However, they also incorporate a special hardware barrier synchronization mechanism that facilitates static scheduling by providing a mechanism which the compiler can use to enforce precise timing constraints. In other words, the compiler tracks relative timing between processors and uses static code scheduling until the timing imprecision becomes too large, at which point the compiler simply inserts a barrier to reduce that timing imprecision to zero (or a small constant).This paper describes new scheduling and barrier placement algorithms for barrier MIMDs that are based loosely on the list scheduling approach employed for VLIWs [Ellis 1985]. In addition, the experimental results from scheduling thousands of synthetic benchmark programs for a parameterized barrier MIMD machine are presented.  相似文献   

10.
SIMD machines are considered special purpose architectures chiefly because of their inability to support control parallelism. This restriction exists because there is a single control unit that is shared at the thread level; concurrent control threads must time-share the control unit (they are sequentially executed). We present an alternative model for building centralized control architectures that allows better support for control parallelism. This model, called shared control, shares the control unit(s) at the instruction level. More precisely, in each cycle the control signals for all the supported instructions are broadcast to the PEs. In turn, each PE receives its control by synchronizing with the control unit responsible for its local instruction. The shared control model is fundamentally different from the SIMD model. There are a number of architectural issues that must be resolved in order for this model to be useful. This paper identifies some of these issues and discusses their respective trade-off spaces. An integrated shared-control/SIMD architecture design (SharC) is presented and used to demonstrate the relative performance relative to a SIMD architecture.  相似文献   

11.
Parallel processing is an area of growing interest to the computer science and engineering communities. This paper is an introduction to some of the concepts involved in the design and use of large-scale parallel systems. Parallel machines that are classified as SIMD (synchronous) and MIMD (asynchronous) systems, composed of a large number of microprocessors, are explored. Parallel algorithms are examined, using image smoothing, recursive doubling and contour tracing as examples. Single stage and multistage networks are discussed. The single stage Cube, PM21, Four Nearest Neighbor and Shuffle-Exchange networks are presented, and the multistage Cube network is described. Case studies of three microprocessor-based systems are given as examples of parallel machine designs, specifically the MPP SIMD machine, the Ultracomputer MIMD system, and the PASM SIMD/MIMD machine.  相似文献   

12.
In this paper, we consider a massively parallel system that is composed of heterogeneous processors, that is, processors with different processing power, and that combines the advantages of the SIMD and MIMD architectures. The heterogeneous mixed-mode (HeMM) execution model is composed of two main components, which operate in the well-known SIMD and MIMD paradigms. The main computing power comes from a component that is composed of a massive number of processors and operates in a data parallel manner. The other component is composed of a few (or even one) fast processors which operate in the MIMD paradigm. The operation of a small number of processors in an MIMD paradigm has been well demonstrated through actual systems. The processors in this component add flexibility to the execution of the parallel programs such that it adjusts to the changing parallelism of the program to enhance the performance. Based on this execution model we analyze the gains in performance that is obtainable by this new system. We show that substantial performance gains can be obtained by using the HeMM system.  相似文献   

13.
This paper presents bitonic sorting schemes for special-purpose parallel architectures such as sorting networks and for general-purpose parallel architectures such as SIMD and/or MIMD computers. First, bitonic sorting algorithms for shared-memory SIMD and/or MIMD computers are developed. Shared-memory accesses through the interconnection network of shared memory SIMD and/or MIMD computers can be very time consuming. A scheme is introduced which reduces the number of such accesses. This scheme is based on the parity strategy which is the main idea of the paper. By reducing the communication through the network, a performance improvement is achieved. Second, a recirculating bitonic sorting network is presented, which is composed of one level of N/2 comparators plus an Ω-network of (log N-1) switch levels. This network reduces the cost complexity to O(N log N) compared with the O(N log2 N) of the original bitonic sorting network, while preserving the same time complexity. Finally, a simplified multistage bitonic sorting network, is presented. For simplifying the interlevel wiring, the parity strategy is used, so N/2 keys are wired straight through the network  相似文献   

14.
《Data Processing》1986,28(8):405-409
Parallel processing can reduce the cost of computing, improve reliability and provide greater throughput. Equally important is the likelihood that real-life problems contain parallelism that should be exploited in the computer architecture if natural solutions are to be found. Many parallel machines are now on the market, both single instruction, multiple data (SIMD) and multiple instruction, multiple data (MIMD), but software engineering techniques have yet to make their complete impact on these architectures.  相似文献   

15.
This paper describes changes made to a previous implementation of an N-body tree code developed for a fine-grained, SIMD computer architecture. These changes include (1) switching from a balanced binary tree to a balanced oct tree, (2) addition of quadrupole corrections, and (3) having the particles search the tree in groups rather than individually. An algorithm for limiting errors is also discussed. In aggregate, these changes have led to a performance increase of over a factor of 10 compared to the previous code. For problems several times larger than the processor array, the code now achieves performance levels of ∼ 1 Gflop on the Maspar MP-2 or roughly 20% of the quoted peak performance of this machine. This percentage is competitive with other parallel implementations of tree codes on MIMD architectures. This is significant, considering the low relative cost of SIMD architectures.  相似文献   

16.
The performance of conjugate gradient (CG) algorithms for the solution of the system of linear equations that results from the finite-differencing of the neutron diffusion equation was analyzed on SIMD, MIMD, and mixed-mode parallel machines. A block preconditioner based on the incomplete Cholesky factorization was used to accelerate the conjugate gradient search. The issues involved in mapping both the unpreconditioned and preconditioned conjugate gradient algorithms onto the mixed-mode PASM prototype, the SIMD MasPar MP-1, and the MIMD Intel Paragon XP/S are discussed. On PASM , the mixed-mode implementation outperformed either SIMD or MIMD alone. Theoretical performance predictions were analyzed and compared with the experimental results on the MasPar MP-1 and the Paragon XP/S. Other issues addressed include the impact on execution time of the number of processors used, the effect of the interprocessor communication network on performance, and the relationship of the number of processors to the quality of the preconditioning. Applications studies such as this are necessary in the development of software tools for mapping algorithms onto either a single parallel machine or a heterogeneous suite of parallel machines.  相似文献   

17.
Skillicorn  D.B. 《Computer》1990,23(12):38-50
The major parallel architecture classes are considered: single-instruction multiple-data (SIMD) computers, tightly coupled multiple-instruction multiple-data (MIMD) computers, hypercuboid computers and constant-valence MIMD computers. An argument that the PRAM model is universal over tightly coupled and hypercube systems, but not over constant-valence-topology, loosely coupled-system is reviewed, showing precisely how the PRAM model is too powerful to permit broad universality. Ways in which a model of computation can be restricted to become universal over less powerful architectures are discussed. The Bird-Meertens formalism (R.S. Bird, 1989), is introduced and it is shown how it is used to express computations in a compact way. It is also shown that the Bird-Meertens formalism is universal over all four architecture classes and that nontrivial restrictions of functional programming languages exist that can be efficiently executed on disparate architectures. The use of the Bird-Meertens formalism as the basis for a programming language is discussed, and it is shown that it is expressive enough to be used for general programming. Other models and programming languages with architecture-independent properties are reviewed  相似文献   

18.
一、引言 并行处理是提高计算机性能的有效途径,已成为计算机系统结构研究的热点。IMD(单指令多数据流)计算机由M.J.Flynn,在1966年对计算机系  相似文献   

19.
Stochastic ray tracing is one of the most elegant methods for anti-aliasing and for generating such phenomena as soft shadows, fuzzy reflections, depth of field, and motion blur, which are difficult to accomplish with the conventional ray-tracing algorithm. Unfortunately, it makes use of stochastic sampling, which requires more than one sample for each pixel. One possible way to speed up ray tracing is to explore the inherent parallelism of the algorithm. In the past few years, the major focus of parallel ray-tracing research has been on the use of MIMD architectures. Although SIMD architectures may be ideal for ray tracing simple scenes, they have been thought unsuitable for ray tracing complex scenes. However, by using scene coherence, we have found that stochastic ray tracing using SIMD processor arrays can be as efficient as most of the existing MIMD ray-tracing algorithms and more cost effective.  相似文献   

20.
PASM is a proposed large-scale distributed/parallel processing system which can be partitioned into independent SIMD/MIMD machines of various sizes. One design problem for systems such as PASM is task scheduling. The use of multiple FIFO queues for nonpreemptive task scheduling is described. Four multiple-queue scheduling algorithms with different placement policies are presented and applied to the PASM parallel processing system. Simulation of a queueing network model is used to compare the performance of the algorithms. Their performance is also considered in the case where there are faulty control units and processors. The multiple-queue scheduling algorithms can be adapted for inclusion in other multiple-SIMD and partitionable SIMD/MIMD systems that use similar types of interconnection networks to those being considered for PASM.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号