期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Data management and control-flow aspects of an SIMD/SPMD parallellanguage/compiler

Nichols M.A. Siegel H.J. Dietz H.G. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(2):222-234

Features of an explicitly parallel programming language targeted for reconfigurable parallel processing systems, where the machine's N processing elements (PEs) are capable of operating in both the SIMD and SPMD modes of parallelism, are described. The SPMD (single program-multiple data) mode of parallelism is a subset of the MIMD mode where all processors execute the same program. By providing all aspects of the language with an SIMD mode version and an SPMD mode version that are syntactically and semantically equivalent, the language facilitates experimentation with and exploitation of hybrid SIMD/SPMD machines. Language constructs (and their implementations) for data management, data-dependent control-flow, and PE-address-dependent control-flow are presented. These constructs are based on experience gained from programming a parallel machine prototype and are being incorporated into a compiler under development. Much of the research presented is applicable to general SIMD machines and MIMD machines 相似文献

2.

Polymorphic-torus architecture for computer vision

Li H. Maresca M. 《IEEE transactions on pattern analysis and machine intelligence》1989,11(3):233-243

A massively parallel fine-grained SIMD (single-instruction multi-data-stream) computer for machine vision computations is described. The architecture features a polymorphic-torus network which inserts an individually controllable switch into every node of the two-dimensional torus such that the network is dynamically reconfigurable to match the algorithm. Reconfiguration is accomplished by circuit switching and is achieved at fine-grained level. Using both the processor coordinate in the torus and the data for reconfiguration, the polymorphic-torus achieves solution time that is superior or equivalent to that of popular vision architectures such as mesh, tree, pyramid and hypercube for many vision algorithms discussed. Implementation of the architecture is given to illustrate its VLSI efficiency 相似文献

3.

Fast solution of large N×N matrix equations in an MIMD–SIMD Hybrid System

Leo Chin Sim Graham Leedham Leo Chin Jian Heiko Schroder 《Parallel Computing》2003,29(11-12):1669

In this paper, we propose a new high-speed computation algorithm for solving a large N×N matrix system using the MIMD–SIMD Hybrid System. The MIMD–SIMD Hybrid System (also denoted as Hybrid System in this paper) is a new parallel architecture consisting of a combination of Cluster of Workstations (COWs) and SIMD systems working concurrently to produce an optimal parallel computation. We first introduce our prototype SIMD system and our Hybrid System setup before presenting how it can be implemented to find the unknowns in a large N×N linear matrix equation system using the Gauss–LU algorithm. This algorithm basically performs the ‘Divide and Conquer’ approach by breaking down the large N×N matrix system into a manageable 32 × 32 matrix for fast computation. 相似文献

4.

Parallel computer vision on a reconfigurable multiprocessor network

Ehandarkar S.M. Arabnia H.R. 《Parallel and Distributed Systems, IEEE Transactions on》1997,8(3):292-309

A novel reconfigurable architecture based on a multiring multiprocessor network is described. The reconfigurability of the architecture is shown to result in a low network diameter and also a low degree of connectivity for each node in the network. The mathematical properties of the network topology and the hardware for the reconfiguration switch are described. Primitive parallel operations on the network topology are described and analyzed. The architecture is shown to contain 2D mesh topologies of varying sizes and also a single one factor of the Boolean hypercube in any given configuration. A large class of algorithms for the 2D mesh and the Boolean n-cube are shown to map efficiently on the proposed architecture without loss of performance. The architecture is shown to be well suited for a number of problems in low and intermediate level computer vision such as the FFT, edge detection, template matching, and the Hough transform. Timing results for typical low and intermediate level vision algorithms on a transputer based prototype are presented 相似文献

5.

Functionally reconfigurable general purpose parallel machines and some image processing and pattern recognition applications

Nikola K Kasabov 《Pattern recognition letters》1985,3(3):215-223

Functionally reconfigurable general purpose parallel machines (FRPM) could be reconfigured during the operation from SIMD to MIMD mode or vice versa (first aspect) and from one interconnection network to another according to the data storing order (second aspect). General purpose machines are considered in order to obtain an arbitrary data exchange between the processing elements they are built of. A model for describing such interconnection networks is presented. A full-information exchange network in introduced which is reconfigurable in a programming way to tree-, matrix-, cube-, linear-neighbourhood and FFT-network. Some schemes for constructing SIMD/MIMD reconfigurable machines are given. The usefullness of using FRMP for image processing and pattern recognition is discussed. 相似文献

6.

Parallel computer vision on Polymorphic Torus architecture

Massimo Maresca Hungwen Li Michael M. C. Sheng 《Machine Vision and Applications》1989,2(4):215-230

Polymorphic Torus is a novel interconnection network for SIMD massively parallel computers, able to support effectively both local and global communication. Thanks to this characteristic, Polymorphic Torus is highly suitable for computer vision applications, since vision involves local communication at the low-level stage and global communication at the intermediate- and high-level stages. In this paper we evaluate the performance of Polymorphic Torus in the computer vision domain. We consider a set of basic vision tasks, namely,convolution, histogramming, connected component labeling, Hough transform, extreme point identification, diameter computation, andvisibility, and show how they can take advantage of the Polymorphic Torus communication capabilities. For each basic vision task we propose a Polymorphic Torus parallel algorithm, give its computational complexity, and compare such a complexity with the complexity of the same task inmesh, tree, pyramid, and hypercube interconnection networks. In spite of the fact that Polymorphic Torus has the same wiring complexity as mesh, the comparison shows that in all of the vision tasks under examination it achieves complexity lower than or at most equal to hypercube, which is the most powerful among the interconnection networks considered. 相似文献

7.

Eliminating memory for fragmentation within partitionable SIMD/SPMDmachines

Nichols M.A. Siegel H.J. Dietz H.G. Quong R.W. Nation W.G. 《Parallel and Distributed Systems, IEEE Transactions on》1991,2(3):290-303

Efficient data layout is an important aspect of the compilation process. A model for the creation of perfect memory maps for large-scale parallel machines capable of user-controlled partitionable single-instruction-multiple data/single-program-multiple data (SIMD/SPMD) operation is developed. The term perfect implies that no memory fragmentation occurs and ensures that the memory map size is kept to a minimum. A major constraint on solving this problem is based on the single program nature of both the SIMD and SPMD modes of parallelism. It is assumed that all processors within the same submachine used identical addresses to access corresponding data items in each of their local memories. Necessary and sufficient conditions are derived for being able to create perfect memory maps, and results are applied to several partitionable interconnection networks 相似文献

8.

Data-parallel C on a reconfigurable logic array

Maya Gokhale Brian Schott 《The Journal of supercomputing》1995,9(3):291-313

相似文献

9.

Euclidean distance transform for binary images on reconfigurable mesh-connected computers.

Y Pan M Hamdi K Li 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2000,30(1):240-244

The distance calculation in an image is a basic operation in computer vision, pattern recognition, and robotics. Several parallel algorithms have been proposed for calculating the Euclidean distance transform (EDT). Recently, Chen and Chuang proposed a parallel algorithm for computing the EDT on mesh-connected SIMD computers (1995). For an nxn image, their algorithm runs in O(n) time on a two-dimensional (2-D) nxn mesh-connected processor array. In this paper, we propose a more efficient parallel algorithm for computing the EDT on a reconfigurable mesh model. For the same problem, our algorithm runs in O(log(2)n) time on a 2-D nxn reconfigurable mesh. Since a reconfigurable mesh uses the same amount of VLSI area as a plain mesh of the same size does when implemented in VLSI, our algorithm improves the result in [3] significantly. 相似文献

10.

Parallel Implementations of Block-Based Motion Vector Estimation for Video Compression on Four Parallel Processing Systems

Min Tan Janet M. Siegel Howard Jay Siegel 《International journal of parallel programming》1999,27(3):195-225

Parallel algorithms, based on a distributed memory machine model, for an exhaustive search technique for motion vector estimation in video compression are being designed and evaluated. Results from the execution on a 16,384 processor MasPar MP-1 (an SIMD machine), a 140 node Intel Paragon XP/S and a 16 node IBM SP2 (two M IMD machines), and the 16 processor PASM prototype (a partitionable SIMD/MIMD mixed-mode machine) are presented. The trade-offs of using different modes of parallelism (SIMD, SPMD, and mixed-mode) and different data partitioning schemes (the rectangular and stripe subimage methods) are examined. The analytical and experimental results shown in this application study will help practitioners to predict and contrast the performance of different approaches to parallel implementation of this important video compression technique. The results presented are also applicable to a large class of image and video processing tasks. Case studies, such as the one presented here, are a necessary step in developing software tools for mapping an application task onto a single parallel machine and for mapping a set of independent application tasks, or the subtasks of a single application task, onto a heterogeneous suite of parallel machines. 相似文献

11.

Communication-Efficient Sorting Algorithms on Reconfigurable Array of Processors With Slotted Optical Buses

《Journal of Parallel and Distributed Computing》1999,57(2):166-187

The reconfigurable array with slotted optical buses (RASOB) has recently received a lot of attention from the research community. In this paper, we first discuss the reconfiguration methods and communication capabilities of the RASOB architecture. Then, we use this architecture for the implementation of efficient sorting algorithms on the 1D RASOB and the 2D RASOB. Our parallel sorting algorithm on the 1D RASOB is based on an efficient divide-and-conquer scheme. It sortsNdata items usingNprocessors inO(k) communication cycles where k is the size of the data items to be sorted in bits. We further develop a parallel sorting algorithm on the 2D RASOB based on the sorting algorithm on the 1D RASOB in conjunction with the well known Rotatesort algorithm. Similarly, this algorithm sortsNdata items on a 2D RASOB of sizeNinO(k) communication cycles. These sorting algorithms are much more efficient than state-of-the-art sorting algorithms on reconfigurable arrays of processors withelectronicbuses using the same number of processors. 相似文献

12.

A method for SIMDMIMD functionally reconfigurable multimicroprocessor systems design and parallel data exchange algorithms

Nikola K Kasabov 《Parallel Computing》1985,2(1):73-78

In

SIMD MIMD

functionally reconfigurable multimicroprocessor systems /MMPS/ some of the microprocessor modules /MPM/ can execute a common program /SIMD mode/ while the rest of the MPMs are executing their own programs /MIMD mode/. Every MPM at any moment can be reconfigured functionally from one to another mode. In this paper the problems of designing such MMPSs are discussed as well as some realisations of a data exchange module as a register module and some algorithms for parallel data exchange between the MPMs. A hierarchically structed MMPS are developed. 相似文献

13.

一个数据并行语言的设计及其实现

陈斯愈黄林鹏《计算机工程》1997,23(3):3-6

数据并行模型应用到ＭＩＭＤ机器上，实现ＳＰＭＤ模式的松散同步的方式越来越受到人们的重视。文中提出了一个以屏构并行系统为环境的数据并行语言Ｍｕｌｔｉ－ｃ的设计和实现。正在实现的Ｍｕｌｉｔｉ－ｃ编译器，以预编译的方式接受ＳＩＭＤ形式的程序说明，放宽同步要求，产生能以ＳＰＭＫ方式在并行系统上运行的Ｃ程序。相似文献

14.

Dynamic reconfiguration of node location in wormhole networks

《Journal of Systems Architecture》2000,46(10):873-888

Several techniques have been developed to increase the performance of parallel computers. Reconfigurable networks can be used as an alternative to increase the performance. Network reconfiguration can be carried out in different ways. Our research has focused on distributed memory systems with dynamic reconfiguration of node location. Briefly, this technique consists of positioning the processors in the network depending on the existing communication pattern among them, to suit the requirements of each computation.In this article, we present a dynamic reconfiguration technique for wormhole networks. We have used both a crossbar and a multistage interconnection network to implement a reconfigurable logical two-dimensional (2-D) torus topology. The reconfiguration mechanism is based on a distributed reconfiguration algorithm. The algorithm is based on a cost function that requires only local information. We discuss reconfiguration features and adjust the different parameters of the reconfiguration algorithm. We have also studied the deadlock problem in reconfigurable wormhole networks, and give details of our solution. Finally, we have evaluated the performance of this technique under several workloads. 相似文献

15.

Odd Memory Systems: A New Approach

Seznec A. Lenfant J. 《Journal of Parallel and Distributed Computing》1995,26(2)

To reject the use of a prime (or odd) number N of memory banks in a vector processor, it is generally advanced that address computation for such a memory system would require systematic Euclidean division by the number N. We first show that the Chinese Remainder Theorem allows one to define a very simple mapping of data onto the memory banks for which address computation does not require any Euclidean division. Massively parallel SIMD computers may have thousands of processors. When the memory on such a machine is globally shared, routing vectors from memory to the processors is a major difficulty; the control for the interconnection network cannot be generally computed at execution time. When the number of memory banks and processors is a product of prime numbers, the family of permutations needed for routing vectors from memory to the processors through the interconnection network has very specific properties. The Chinese Remainder Network presented in the paper is able to execute all these permutations in a single path and may be easily controlled. 相似文献

16.

UML-based hardware/software co-design platform for dynamically partially reconfigurable network security systems

Chun-Hsian Huang Pao-Ann Hsiung Jih-Sheng Shen 《Journal of Systems Architecture》2010,56(2-3):88-102

The dynamic partial reconfiguration technology of FPGA has made it possible to adapt system functionalities at run-time to changing environment conditions. However, this new dimension of dynamic hardware reconfigurability has rendered existing CAD tools and platforms incapable of efficiently exploring the design space. As a solution, we proposed a novel UML-based hardware/software co-design platform (UCoP) targeting at dynamically partially reconfigurable network security systems (DPRNSS). Computation-intensive network security functions, implemented as reconfigurable hardware functions, can be configured on-demand into a DPRNSS at run-time. Thus, UCoP not only supports dynamic adaptation to different environment conditions, but also increases hardware resource utilization. UCoP supports design space exploration for reconfigurable systems in three folds. Firstly, it provides reusable models of typical reconfigurable systems that can be customized according to user applications. Secondly, UCoP provides a partially reconfigurable hardware task template, using which users can focus on their hardware designs without going through the full partial reconfiguration flow. Thirdly, UCoP provides direct interactions between UML system models and real reconfigurable hardware modules, thus allowing accurate time measurements. Compared to the existing lower-bound and synthesis-based estimation methods, the accurate time measurements using UCoP at a high abstraction level can more efficiently reduce the system development efforts. 相似文献

17.

A reconfigurable architecture for autonomous visual-navigation

Jose Antonio Boluda Fernando Pardo 《Machine Vision and Applications》2003,13(5-6):322-331

Abstract. This paper describes the design of a reconfigurable architecture for implementing image processing algorithms. This architecture is a pipeline of small identical processing elements that contain a programmable logic device (FPGA) and double port memories. This processing system has been adapted to accelerate the computation of differential algorithms. The log-polar vision selectively reduces the amount of data to be processed and simplifies several vision algorithms, making possible their implementation using few hardware resources. The reconfigurable architecture design has been devoted to implementation, and has been employed in an autonomous platform, which has power consumption, size and weight restrictions. Two different vision algorithms have been implemented in the reconfigurable pipeline, for which some experimental results are shown. Received: 30 March 2001 / Accepted: 11 February 2002 RID="*" ID="*" This work has been supported by the Ministerio de Ciencia y Tecnología and FEDER under project TIC2001-3546 Correspondence to: J.A. Boluda 相似文献

18.

Proteus: A reconfigurable computational network for computer vision

Robert M. Haralick Arun K. Somani Craig Wittenbrink Robert Johnson Kenneth Cooper Linda G. Shapiro Ihsin T. Phillips Jenq-Neng Hwang William Cheung Yung Hsi Yao Chung-Ho Chen Larry Yang Brian Daugherty Bob Lorbeski Kent Loving Tom Miller Larye Parkins Steve Soos 《Machine Vision and Applications》1995,8(2):85-100

The Proteus architecture is a highly parallel, multiple instruction, multiple data machine (MIMD) optimized for large granularity tasks such as machine vision and image processing. The system can achieve 20 gigaflops (80 gigaflops peak). It accepts data via multiple serial links at a rate of up to 640 MB/S. The system employs a hierarchical reconfigurable interconnection network with the highest level being a circuit-switchedenhanced hypercube, serial interconnection network for internal data transfers. The system is designed to use 256 to 1024 RISC processors. The processors use 1-MB externalread/write allocating caches for reduced multiprocessor contention. The system detects, locates, and replaces faulty subsystems using redundant hardware to facilitatefault tolerance. The parallelism is directly controllable through an advanced software system for partitioning, scheduling, and development. System software includes a translator for the INSIGHT language, a parallel debugger, lowand high-level simulators, and a message-passing system for all control needs. Image-processing application software includes a variety of point operators, neighborhood operators, convolution, and the mathematical morphology operations of binary and gray-scale dilation, erosion, opening, and closing. 相似文献

19.

Data reduction and fast routing: A strategy for efficient algorithms for message-passing parallel computers

Jorge L. C. Sanz Robert Cypher 《Algorithmica》1992,7(1):77-89

This paper presents several algorithms for solving problems using massively parallel SIMD hypercube and shuffle-exchange computers. The algorithms solve a wide variety of problems, but they are related because they all use a common strategy. Specifically, all of the algorithms use a divide-and-conquer approach to solve a problem withN inputs using a parallel computer withP processors. The structural properties of the problem are exploited to assure that fewer thanN data items are communicated during the division and combination steps of the divide-and-conquer algorithm. This reduction in the amount of data that must be communicated is central to the efficiency of the algorithm.This paper addresses four problems, namely the multiple-prefix, data-dependent parallel-prefix, image-component-labeling, and closest-pair problems. The algorithms presented for the data-dependent parallel-prefix and closest-pair problems are the fastest known whenN P and the algorithms for the multiple-prefix and image-component-labeling problems are the fastest known whenN is sufficiently large with respect toP.This work was supported in part by our NSF Graduate Fellowship. 相似文献

20.

Representation of Edmonds' Algorithm for Finding Optimum Graph Branching on Associative Parallel Processors

A. Sh. Nepomniaschaya 《Programming and Computer Software》2001,27(4):200-206

In the paper, an efficient parallel implementation of Edmonds' algorithm is suggested for finding optimum graph branching on an abstract model of the SIMD type with vertical data processing (STAR machine). For this, associative parallel algorithms for finding critical circuit and its contraction, as well as for unfolding embedded critical circuits, are constructed for directed weighted graphs represented as a list of arcs and their weights. It is shown that the execution of Edmonds' algorithm on a STAR machine requires O(nlogn) time, where nis the number of graph vertexes. Basic advantages of the parallel implementation of Edmonds' algorithm compared to its implementation on sequential computers are discussed. 相似文献