期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Datawave: a single-chip multiprocessor for video applications

Schmidt U. Caesar K. 《Micro, IEEE》1991,11(3)

A fine-grained MIMD (multiple-instruction, multiple-data) array processor for video applications that combines submicron technology, parallel processing, and dataflow programming is presented. The Datawave processor is used as the building block of this cellular, data-driven system architecture. The processor executes statically scheduled dataflow programs, and self-timed hardware mechanisms handle the asynchronous dataflows automatically and transparently. The architecture is discussed first at the array level and then at the cell level. It is shown how Datawave implements a four-tap finite impulse response filer and a real-time image codec. Program development tools for Datawave are discussed, and the chip itself is briefly described 相似文献

2.

An associative processing module for a heterogeneous visionarchitecture

Storer R. Pout M.R. Thomson A.R. Dagless E.L. Duller A.W.G. Marriott A.P. Hicks P.J. 《Micro, IEEE》1992,12(3):42-55

The heterogeneous vision architecture that satisfies the computing demands of real-time computer vision by providing parallelism in three different forms is described. A pipeline of digital signal processing (DSP) chips initially processes signals. Then a SIMD associative processor array processes images and extract features, and a MIMD network of transputers processes extracted objects in parallel. The array's VLSI implementation, the processing modes available due to the use of content-addressable memory, and the means of achieving efficient 2-D interprocessor communication in the linear array are described. An application as a vehicle number plate recognition system is presented 相似文献

3.

Hierarchical multiple-SIMD architecture for image analysis

Graham Nudd Nick Francis Tim Atherton Darren Kerbyson Roger Packwood John Vaudin 《Machine Vision and Applications》1992,5(2):85-103

Real-time image analysis requires the use of massively parallel machines. Conventional parallel machines consist of an array of identical processors organized in either single instruction multiple data (SIMD) or multiple instruction multiple data (MIMD) configurations. Machines of this type generally only operate effectively on parts of the image analysis problem. SIMD on the low level processing and MIMD on the high level processing. In this paper we describe the Warwick Pyramid Machine, an architecture consisting of both SIMD and MIMD parts in a multiple-SIMD (MSIMD) organization which can operate effectively at all levels of the image analysis problem. 相似文献

4.

General-purpose systolic arrays

Johnson K.T. Hurson A.R. Shirazi B. 《Computer》1993,26(11):20-31

The extension of systolic array architecture from fixed- or special-purpose architectures to general-purpose, SIMD (single-instruction stream, multiple-data stream), MIMD (multiple-instruction stream, multiple-data stream) architectures, and hybrid architectures that combine both commercial and FPGA (field-programmable gate array) technologies is chronicled. The authors present a taxonomy for systolic organizations, discuss each architecture's methods of exploiting concurrencies, and compare performance attributes of each. The authors also describe a number of implementation issues that determine a systolic array's performance efficiency, such as algorithms and mapping, system integration through memory subsystems, cell granularity, and extensibility to a wide variety of topologies 相似文献

5.

Orthogonal multiprocessor sharing memory with an enhanced mesh for integrated image understanding

《CVGIP: Image Understanding》1991,53(1):31-45

This paper proposes a new parallel architecture, which has the potential to support low-level image processing as well as intermediate and high-level vision analysis tasks efficiently. The integrated architecture consists of an SIMD mesh of processors enhanced with multiple broadcast buses, and MIMD multiprocessor with orthogonal access buses, and a two-dimensional shared memory array. Low-level image processing is performed on the mesh processor, while intermediate and high-level vision analysis is performed on the orthogonal multiprocessor. The interaction between the two levels is supported by a common shared memory. Concurrent computations and I/O are made possible by partitioning the memory into disjoint spaces so that each processor system can access a different memory space. To illustrate the power of such a two-level system, we present efficient parallel algorithms for a variety of problems from low-level image processing to high-level vision. Representative problems include matrix based computations, histogramming and key counting operations, image component labeling, pyramid computations, Hough transform, pattern clustering, and scene labeling. Through computational complexity analysis, we show that the integrated architecture meets the processing requirements of most image understanding tasks. 相似文献

6.

HiMAP: a portable super modular multilevel parallel multidisciplinary process for large scale analysis

《Advances in Engineering Software》2000,31(8-9):617-620

相似文献

7.

Architecture-independent parallel computation

Skillicorn D.B. 《Computer》1990,23(12):38-50

The major parallel architecture classes are considered: single-instruction multiple-data (SIMD) computers, tightly coupled multiple-instruction multiple-data (MIMD) computers, hypercuboid computers and constant-valence MIMD computers. An argument that the PRAM model is universal over tightly coupled and hypercube systems, but not over constant-valence-topology, loosely coupled-system is reviewed, showing precisely how the PRAM model is too powerful to permit broad universality. Ways in which a model of computation can be restricted to become universal over less powerful architectures are discussed. The Bird-Meertens formalism (R.S. Bird, 1989), is introduced and it is shown how it is used to express computations in a compact way. It is also shown that the Bird-Meertens formalism is universal over all four architecture classes and that nontrivial restrictions of functional programming languages exist that can be efficiently executed on disparate architectures. The use of the Bird-Meertens formalism as the basis for a programming language is discussed, and it is shown that it is expressive enough to be used for general programming. Other models and programming languages with architecture-independent properties are reviewed 相似文献

8.

Parallel image understanding algorithms on MIMD multicomputers

A. Petrosino E. Tarantino 《Computing》1998,60(2):91-107

The heterogeneous nature of data types and computational structures involved in Computer Vision algorithms make the design and implementation of massively parallel image processing systems a not yet fully solved problem. It is common belief that in the next future MIMD architectures with their high degree of flexibility will play a very important role in this research area, by using a limited number of identical but powerful processing elements. The aim of this paper is to show how a selected list of algorithms in which a unique Image Understanding process can be decomposed could map onto a distributed-memory MIMD architecture. The operative modalities we adopt are the SPMD modality for the low level processing and the MIMD modality for the intermediate and high levels of processing. Either efficient parallel formulations of the algorithms with respect to the interconnection topology of processors and their optimized implementations on a target transputer-based architecture are reported. 相似文献

9.

The image understanding architecture

Charles C. Weems Steven P. Levitan Allen R. Hanson Edward M. Riseman David B. Shu J. Gregory Nash 《International Journal of Computer Vision》1989,2(3):251-282

This paper provides an overview of the Image Understanding Architecture (IUA), a massively parallel, multilevel system for supporting real-time image understanding applications and research in knowledge-based computer vision. The design of the IUA is motivated by considering the architectural requirements for integrated real-time vision in terms of the type of processing element, control of processing, and communication between processing elements.The IUA integrates parallel processors operating simultaneously at three levels of computational granularity in a tightly coupled architecture. Each level of the IUA is a parallel processor that is distinctly different from the other two levels, designed to best meet the processing needs at each of the corresponding levels of abstraction in the interpretation process. Communication between levels takes place via parallel data and control paths. The processing elements within each level can also communicate with each other in parallel, via a different mechanism at each level that is designed to meet the specific communication needs of each level of abstraction.An associative processing paradigm has been utilized as the principle control mechanism at the low and intermediate levels. It provides a simple yet general means of managing massive parallelism, through rapid responses to queries involving partial matches of processor memory to broadcast values. This has been enhanced with hardware operations that provide for global broadcast, local compare, Some/None response, responder count, and single responder select. To demonstrate how the IUA may be used for vision processing, several sample algorithms and a typical interpretation scenario on the IUA are presented.We believe that the IUA represents a major step toward the development of a proper combination of integrated processing power, communication, and control required for real-time computer vision. A proof-of-concept prototype of 1/64th of the IUA is currently being constructed by the University of Massachusetts and Hughes Research Laboratories. 相似文献

10.

The UCSC Kestrel parallel processor

Di Bias A. Dahle D.M. Diekhans M. Grate L. Hirschberg J. Karplus K. Keller H. Kendrick M. Mesa-Martinez F.J. Pease D. Rice E. Schultz A. Speck D. Hughey R. 《Parallel and Distributed Systems, IEEE Transactions on》2005,16(1):80-92

The architectural landscape of high-performance computing stretches from superscalar uniprocessor to explicitly parallel systems, to dedicated hardware implementations of algorithms. Single-purpose hardware can achieve the highest performance and uniprocessors can be the most programmable. Between these extremes, programmable and reconfigurable architectures provide a wide range of choice in flexibility, programmability, computational density, and performance. The UCSC Kestrel parallel processor strives to attain single-purpose performance while maintaining user programmability. Kestrel is a single-instruction stream, multiple-data stream (SIMD) parallel processor with a 512-element linear array of 8-bit processing elements. The system design focuses on efficient high-throughput DNA and protein sequence analysis, but its programmability enables high performance on computational chemistry, image processing, machine learning, and other applications. The Kestrel system has had unexpected longevity in its utility due to a careful design and analysis process. Experience with the system leads to the conclusion that programmable SIMD architectures can excel in both programmability and performance. This work presents the architecture, implementation, applications, and observations of the Kestrel project at the University of California at Santa Cruz. 相似文献

11.

The instruction systolic array and its relation to other models of parallel computers

《Parallel Computing》1988,7(1):25-39

In this paper we investigate the relationships between three different models of parallel computers based on mesh-connected arrays: the processor array (PA), which is an MIMD-array of independent processors, the instruction broadcasting array (IBA), where the instructions are broadcast to all the processors of a column and executed according to selector information which is broadcast to all the processors of a row, and the instruction systolic array (ISA), where the instructions are pumped through the array row by row and combined with selector information which is pumped through the array column by column. For every two of these models we determine tight bounds on the worst-case delay introduced by a transformation of a program on one model into an equivalent program on the other. The results show that the ISA concept combines the advantages of standard systolic arrays with those of the MIMD concept. Since in addition the ISA architecture has smaller area requirements than a corresponding systolic array or MIMD machine it is strong practical relevance. 相似文献

12.

萤火虫2：一种多态并行机的硬件体系结构

李涛杨婷易学渊蒲林钱博文黄光新黄虎才韩俊刚《计算机工程与科学》2014,36(2):191-200

提出了一种新型的多态高效并行阵列机结构--萤火虫2号阵列机。该结构的处理单元可以在SIMD和MIMD两种模式下运行,兼有异步执行机制,还可以实现分布式指令级并行处理。采用了硬件的多线程管理器和高效通信机制,这些机制使得此种阵列机能够实现效率很高的线程级并行运算、数据级并行运算和分布式指令级并行运算。尤其值得指出的是,此种阵列机的流处理性能堪与专用集成电路匹敌。该结构还能有效实现静态与动态数据流计算,可以高效实现图形、图像和数字信号处理任务。相似文献

13.

A parallel processing VLSI BAM engine

Hasan S.M.R. Ng Kang Siong 《Neural Networks, IEEE Transactions on》1997,8(2):424-436

In this paper emerging parallel/distributed architectures are explored for the digital VLSI implementation of adaptive bidirectional associative memory (BAM) neural network. A single instruction stream many data stream (SIMD)-based parallel processing architecture, is developed for the adaptive BAM neural network, taking advantage of the inherent parallelism in BAM. This novel neural processor architecture is named the sliding feeder BAM array processor (SLiFBAM). The SLiFBAM processor can be viewed as a two-stroke neural processing engine, It has four operating modes: learn pattern, evaluate pattern, read weight, and write weight. Design of a SLiFBAM VLSI processor chip is also described. By using 2-mum scalable CMOS technology, a SLiFBAM processor chip with 4+4 neurons and eight modules of 256x5 bit local weight-storage SRAM, was integrated on a 6.9x7.4 mm(2) prototype die. The system architecture is highly flexible and modular, enabling the construction of larger BAM networks of up to 252 neurons using multiple SLiFBAM chips. 相似文献

14.

Efficient tree codes on SIMD computer architectures

Kevin M. Olson 《Computer Physics Communications》1996,98(3):267-287

This paper describes changes made to a previous implementation of an N-body tree code developed for a fine-grained, SIMD computer architecture. These changes include (1) switching from a balanced binary tree to a balanced oct tree, (2) addition of quadrupole corrections, and (3) having the particles search the tree in groups rather than individually. An algorithm for limiting errors is also discussed. In aggregate, these changes have led to a performance increase of over a factor of 10 compared to the previous code. For problems several times larger than the processor array, the code now achieves performance levels of ∼ 1 Gflop on the Maspar MP-2 or roughly 20% of the quoted peak performance of this machine. This percentage is competitive with other parallel implementations of tree codes on MIMD architectures. This is significant, considering the low relative cost of SIMD architectures. 相似文献

15.

Integrated performance models for SPMD applications and MIMDarchitectures

Cremonesi P. Gennaro C. 《Parallel and Distributed Systems, IEEE Transactions on》2002,13(7):745-757

Introduces queuing network models for the performance analysis of SPMD (single-program, multiple-data) applications executed on general-purpose parallel architectures such as MIMD (multiple-input, multiple data) and clusters of workstations. The models are based on the pattern of computation, communication and I/O operations of typical parallel applications. Analysis of the models leads to the definition of speedup surfaces which capture the relative influence of processors and I/O parallelism and show the effects of different hardware and software components on the performance. Since the parameters of the models correspond to measurable program and hardware characteristics, the models can be used to anticipate the performance behavior of a parallel application as a function of the target architecture (i.e. the number of processors, number of disks, I/O topology, etc.) 相似文献

16.

A general-purpose CMOS associative processor IC and system

Stormon C.D. Troullinos N.B. Saleh E.M. Chavan A.V. Brule M.R. Oldfield J.V. 《Micro, IEEE》1992,12(6):68-78

An associative processor architecture that integrates the functionality of content-addressable memory (CAM), functional memory (FM), and associative parallel processors (APPs) in a single-chip architecture is described. The hardware design, environment and applications of the Coherent Processor, a microchannel memory device designed by combining 16 such chips, are discussed. It is shown that the processor's writable control store permits quick execution of application-specific microcoded operations 相似文献

17.

An associative accelerator for large databases

Faudemay P. Mhiri M. 《Micro, IEEE》1991,11(6):22-34

The RAPID-1 (relational access processor for intelligent data), an associative accelerator that recognizes tuples and logical formulas, is presented. It evaluates logical formulas instantiated by the current tuple, or record, and operates on whole relations or on hashing buckets. RAPID- 1 uses a reduced instruction set and hardwired control and executes all comparisons in a bit-parallel mode. It speeds up the database by a significant factor and will adapt to future generations of microprocessors. The principal design issues, data structures, instruction set, architecture, environments and performance are discussed 相似文献

18.

Clustering on a hypercube multicomputer

Ranka S. Sahni S. 《Parallel and Distributed Systems, IEEE Transactions on》1991,2(2):129-137

Squared error clustering algorithms for single-instruction multiple-data (SIMD) hypercubes are presented. The algorithms are shown to be asymptotically faster than previously known algorithms and require less memory per processing element (PE). For a clustering problem with N patterns, M features per pattern, and K clusters, the algorithms complete in O(k+log NM ) steps on NM processor hypercubes. This is optimal up to a constant factor. These results are extended to the case in which NMK processors are available. Experimental results from a multiple-instruction, multiple-data (MIMD) medium-grain hypercube are also presented 相似文献

19.

A parallel sorting algorithm for a novel model of computation

Amitabha Das Louise E. Moser P. M. Melliar-Smith 《International journal of parallel programming》1991,20(5):403-419

The computational complexity of a parallel algorithm depends critically on the model of computation. We describe a simple and elegant rule-based model of computation in which processors apply rules asynchronously to pairs of objects from a global object space. Application of a rule to a pair of objects results in the creation of a new object if the objects satisfy the guard of the rule. The model can be efficiently implemented as a novel MIMD array processor architecture, the Intersecting Broadcast Machine. For this model of computation, we describe an efficient parallel sorting algorithm based on mergesort. The computational complexity of the sorting algorithm isO(nlog² n), comparable to that for specialized sorting networks and an improvement on theO(n ^1.5) complexity of conventional mesh-connected array processors. 相似文献

20.

Three-dimensional optical architecture and data-parallel algorithmsfor massively parallel computing

Louri A. 《Micro, IEEE》1991,11(2)

A 3-D optical architecture currently under investigation is described. This model, a single-instruction, multiple-data (SIMD) system, exploits spatial parallelism and processes 2-D binary images as fundamental computational entities using symbolic substitution logic. This system effectively implements highly structured data-parallel algorithms, such as signal and image processing, partial differential equations, multidimensional numerical transforms, and numerical supercomputing. The model includes a hierarchical mapping technique that helps design the algorithms and maps them onto the proposed optical architecture. The symbolic substitution logic and the mapping of data-parallel algorithms are discussed. The theoretical performance of the optical system was estimated and compared with that of electronic SIMD array processors. Preliminary results show that the system provides greater computational throughput and efficiency than its electronic counterparts 相似文献