期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

许超《计算机研究与发展》2004,41(3):451-455

零树编码技术已经被MPEG-4国际标准所采用，多位平面并行的EZW零树编码电路方案为实时应用中的零树编码提供了一条高效的技术途径，它具体包括一种简单、巧妙的预处理器，对不同位平面之间存在的关联加以分离，保证多位平面并行零树编码的实现。另外，在每个位平面中，此方案利用符号分配与跳过处理的执行特点，将编码操作分解成两步，分别结合到两次正、反向的树深度扫描之中，避免了不规则的扫描、处理。此设计在FPGA电路上进行了验证，它可以实时编码CIF格式视频图像，需要2500个左右的逻辑单元。相似文献

2.

Scalable high-throughput variable block size motion estimation architecture

Stephen Warrington Wai-Yip Chan Subramania Sudharsanan 《Microprocessors and Microsystems》2009,33(4):319-325

Variable block size (VBS) motion compensated prediction (MCP) provides substantial rate-distortion performance gain over conventional fixed-block-size MCP and is a key feature of the H.264/AVC video coding standard. VBS–MCP requires the encoder to perform VBS motion estimation (VBSME), a computationally complex operation. In this paper, we propose a high motion vector throughput full-search VBSME architecture. High performance is achieved by performing parallel computations for multiple pixels within a macroblock, as well as computing several candidate motion vector (MV) positions in parallel. Two implementations of the architecture are examined, a four pixel-parallel implementation, and a higher performance 16 pixel-parallel implementation. A high degree of scalability is achieved by allowing for a variable length processing element array, where more processing elements yields a higher degree of candidate MV parallelism. The proposed architecture achieves a throughput exceeding current full-search VBSME architectures. 相似文献

3.

Mayfly: A general-purpose, scalable, parallel processing architecture

Al Davis 《LISP and Symbolic Computation》1992,5(1-2):7-47

TheMayfly is a scalable general-purpose parallel processing system being designed at HP Laboratories, in collaboration with colleagues at the University of Utah. The system is intended to efficiently support parallel variants of modern programming languages such as Lisp, Prolog, and Object Oriented Programming models. These languages impose a common requirement on the hardware platform to supportdynamic system needs such as runtime type checking and dynamic storage management. The main programming language for the Mayfly is a concurrent dialect of Scheme. The system is based on a distributed-memory model, and communication between processing elements is supported by message passing. The initial prototype of Mayfly will consist of 19 identical processing elements interconnected in a hexagonal mesh structure. In order to achieve the goal of scalable performance, each processing element is a parallel processor as well, which permits the application code, runtime operating system, and communication to all run in parallel. A 7 processing element subset of the prototype is presently operational. This paper describes the hardware architecture after a brief background synopsis of the software system structure. 相似文献

4.

Knowledge-based environment for investigating multicomputer architectures

TG Kim BP Zeigler 《Information and Software Technology》1989,31(10):512-520

Multicomputers for massively parallel processing will eventually employ billions of processing elements, each of which will be capable of communicating with every other processing element. A knowledge-based modelling and simulation environment (KBMSE) for investigating such multicomputer architecture at a discrete-event system level is described. The KBMSE implements the discrete-event system specification (DEVS) formalism in an object-oriented programming system of Scheme (a dialect), which supports building models in a hierarchical, modular manner, a systems-oriented approach not possible in conventional simulation languages. The paper presents a framework for knowledge-based modelling and simulation by exemplifying modelling a hypercube multicomputer architecture in the KBMSE. The KBMSE has been tested on a variety of domains characterized by complex, hierarchical structures such as advanced multicomputer architectures, local area computer networks, intelligent multi-robot organizations, and biologically based life-support systems. 相似文献

5.

KAIST image computing system (KICS): A parallel architecture for real-time multimedia data processing

JaeHo Hyung-Sun GeonYoung HyunWook 《Journal of Systems Architecture》2000,46(15):1403-1418

An efficient parallel architecture is proposed for high-performance multimedia data processing using multiple multimedia video processors (MVP; TMS320C80), which are fully programmable general digital signal processors (DSP). This paper describes several requirements for a multimedia data processing system and the system architecture of an image computing system called the KAIST Image Computing System (KICS). The performance of the KICS is evaluated in terms of its I/O bandwidth and the execution time for some image processing functions. An application of the KICS to the real-time Moving Picture Expert Group 2 (MPEG-2) encoder is introduced. The programmability and the high-speed data-access capability of the KICS are its most important features as a high-performance system for real-time multimedia data processing. 相似文献

6.

Parallel Approaches for Singular Value Decomposition as Applied to Robotic Manipulator Jacobians

Tracy D. Braun Renard Ulrey Anthony A. Maciejewski Howard Jay Siegel 《International journal of parallel programming》2002,30(1):1-35

The system of equations that govern kinematically redundant robotic manipulators is commonly solved by finding the singular value decomposition (SVD) of the corresponding Jacobian matrix. This can require a considerable amount of time to compute, thus a parallel SVD algorithm reducing execution time is sought. The approach employed here lends itself to parallelization by using Givens rotations and information from previous decompositions. The key contribution of this research is the presentation and implementation of parallel SVD algorithms to compute the SVD for a set of Jacobians that represent various different joint failure scenarios. Results from implementation of the algorithm on a MasPar MP-1, an IBM SP2, and the PASM prototype parallel computers are compared. Specific issues considered for each implementation include: how data is mapped to the processing elements, the effect that increasing the number of processing elements has on execution time, the type of parallel architecture used, and trade-offs between modes of parallelism. 相似文献

7.

高速三维实时图象帧缓存设计

下载免费PDF全文

马兰沈笑云侯春萍《中国图象图形学报》2000,5(8):703-705

该文提出了一种新的高速三维实时图象系统的设计方法,大对图象存储算法分析的基础上,根据算法内在的并发性,提出了一种流水式多ＳＩＭＤ并行三维图象处理结构,这种结构可使图象处理器按行、列或一个任意的矩形块同时存取帧缓存的象素,从而可解决图象分块并行处理时交界处不应有的变异问题。相似文献

8.

A self-adjusting dynamic logic module

Tony R. Martinez Douglas M. Campbell 《Journal of Parallel and Distributed Computing》1991,11(4)

This paper presents an ASOCS (adaptive self-organizing concurrent system) model for massively parallel processing of incrementally defined rule systems in such areas as adaptive logic, robotics, logical inference, and dynamic control. An ASOCS is an adaptive network composed of many simple computing elements operating asynchronously and in parallel. This paper focuses on Adaptive Algorithm 2 (AA2) and details its architecture and learning algorithm. AA2 has significant memory and knowledge maintenance advantages over previous ASOCS models. An ASOCS can operate in either a data processing mode or a learning mode. During the learning mode, the ASOCS is given a new rule expressed as a Boolean conjunction. The AA2 learning algorithm incorporates the new rule in a distributed fashion in a short, bounded time. During the data processing mode, the ASOCS acts as a parallel hardware circuit. 相似文献

9.

A reconfigurable computing framework for multi-scale cellular image processing

Reid Porter Jan Frigo Al Conti Neal Harvey Garrett Kenyon Maya Gokhale 《Microprocessors and Microsystems》2007,31(8):546-563

Cellular computing architectures represent an important class of computation that are characterized by simple processing elements, local interconnect and massive parallelism. These architectures are a good match for many image and video processing applications and can be substantially accelerated with Reconfigurable Computers. We present a flexible software/hardware framework for design, implementation and automatic synthesis of cellular image processing algorithms. The system provides an extremely flexible set of parallel, pipelined and time-multiplexed components which can be tailored through reconfigurable hardware for particular applications. The most novel aspects of our framework include a highly pipelined architecture for multi-scale cellular image processing as well as support for several different pattern recognition applications. In this paper, we will describe the system in detail and present our performance assessments. The system achieved speed-up of at least 100× for computationally expensive sub-problems and 10× for end-to-end applications compared to software implementations. 相似文献

10.

The image understanding architecture

Charles C. Weems Steven P. Levitan Allen R. Hanson Edward M. Riseman David B. Shu J. Gregory Nash 《International Journal of Computer Vision》1989,2(3):251-282

This paper provides an overview of the Image Understanding Architecture (IUA), a massively parallel, multilevel system for supporting real-time image understanding applications and research in knowledge-based computer vision. The design of the IUA is motivated by considering the architectural requirements for integrated real-time vision in terms of the type of processing element, control of processing, and communication between processing elements.The IUA integrates parallel processors operating simultaneously at three levels of computational granularity in a tightly coupled architecture. Each level of the IUA is a parallel processor that is distinctly different from the other two levels, designed to best meet the processing needs at each of the corresponding levels of abstraction in the interpretation process. Communication between levels takes place via parallel data and control paths. The processing elements within each level can also communicate with each other in parallel, via a different mechanism at each level that is designed to meet the specific communication needs of each level of abstraction.An associative processing paradigm has been utilized as the principle control mechanism at the low and intermediate levels. It provides a simple yet general means of managing massive parallelism, through rapid responses to queries involving partial matches of processor memory to broadcast values. This has been enhanced with hardware operations that provide for global broadcast, local compare, Some/None response, responder count, and single responder select. To demonstrate how the IUA may be used for vision processing, several sample algorithms and a typical interpretation scenario on the IUA are presented.We believe that the IUA represents a major step toward the development of a proper combination of integrated processing power, communication, and control required for real-time computer vision. A proof-of-concept prototype of 1/64th of the IUA is currently being constructed by the University of Massachusetts and Hughes Research Laboratories. 相似文献

11.

OR-parallel execution of Prolog on a multi-sequential machine

Khayri A. M. Ali 《International journal of parallel programming》1986,15(3):189-214

Based on extending the sequential execution model of Prolog to include parallel execution, we present a method for OR-parallel execution of Prolog on a multiprocessor system. The method reduces the overhead incurred by parallel processing. It allows many processing elements (PEs) to process simultaneously a common branch of a search tree, and each of these PEs creates its local environment and selects a subtree for processing without communication. The run-time overhead is small: simple and efficient operations for selecting the proper subtree. Communication is necessary only when some PEs have exhausted their search spaces and there are others still searching for solutions. The method is able to utilize most of the technology devised for sequential implementation of Prolog. It is optimized for an architecture that supports broadcast copying. 相似文献

12.

The Mod 2 Neurocomputer system design

Mumford M.L. Andes D.K. Kern L.L. 《Neural Networks, IEEE Transactions on》1992,3(3):423-433

The Mod 2 Neurocomputer, the latest in a series of neurocomputing systems at the Naval Air Warfare Center Weapons Division, is a neural network processing system incorporating individual neural networks as subsystems in a layered hierarchical architecture. The Mod 2 is designed to support parallel processing of image data at sensor (real-time) rates. Basic concepts implemented in the Mod 2 are (1) maintaining data representations as frames of data processed as a whole at each layer, (2) a general interconnect design supporting data transfer requirements such as generation of parallel pathways, fan-up/fan-down, and feedforward and feedback, and (3) a neuroprocessing block supporting several neural network paradigms. The basis for the system implementation is the Intel 80170NX neural network processor. Examples are given for the implementation strategy for neural substructures such as the multilayer perceptron and temporal and spatiotemporal image processing, as well as the implementation of a multifunction processing system. 相似文献

13.

Toward advanced parallel processing: exploiting parallelism at taskand instruction levels

Fukuda A. Murakami K. Tomita S. 《Micro, IEEE》1991,11(4)

The status of two projects that entail the development of a reconfigurable parallel processor system with 128 Sparc microprocessors and a superscalar processor with four operations proceeding in parallel is discussed. The design principles, system configuration, processing element, network architecture, and memory architecture of the reconfigurable processors (called KRPP) are described. The operating system for KRPP is discussed. The architecture for the superscalar (called a dynamically hazard-resolved, statically code-scheduled, nonuniform superscalar) is presented 相似文献

14.

An optimum parallel architecture for high-speed real-time digital signal processing

Lang G.R. Dharssi M. Longstaff F.M. Longstaff P.S. Metford P.A.S. Rimmer M.T. 《Computer》1988,21(2):47-57

The authors describe a parallel processing architecture for real-time digital signal processing that has demonstrated virtually 100% data processing efficiency in a number of areas. The Teamed-Architecture Signal Processor (T-ASP) is a field-proven, commercially available optimal system solution to the extremely high computational and I/O rates encountered in modern digital-signal-processing environments. The design of T-ASP involves the consideration and implementation of many architectural concepts used to enhance the performance of a computer, including programmability, parallel processing, vector processing and pipelining, memory interleaving, double cache memories, multiple high-speed I/O interfaces, and segmentation of the processors for elimination of both CPU and data-handling overhead. The authors discuss hardware architecture design and implementation; hardware management; and software architecture design and implementation.<> 相似文献

15.

Scalable mpNoC for massively parallel systems – Design and implementation on FPGA

M. Baklouti Y. Aydi Ph. Marquet J.L. Dekeyser M. Abid 《Journal of Systems Architecture》2010,56(7):278-292

The high chip-level integration enables the implementation of large-scale parallel processing architectures with 64 and more processing nodes on a single chip or on an FPGA device. These parallel systems require a cost-effective yet high-performance interconnection scheme to provide the needed communications between processors. The massively parallel Network on Chip (mpNoC) was proposed to address the demand for parallel irregular communications for massively parallel processing System on Chip (mppSoC). Targeting FPGA-based design, an efficient mpNoC low level RTL implementation is proposed taking into account design constraints. The proposed network is designed as an FPGA based Intellectual Property (IP) able to be configured in different communication modes. It can communicate between processors and also perform parallel I/O data transfer which is clearly a key issue in an SIMD system. The mpNoC RTL implementation presents good performances in terms of area, throughput and power consumption which are important metrics targeting an on chip implementation. mpNoC is a flexible architecture that is suitable for use in FPGA-based parallel systems. This paper introduces the basic mppSoC architecture. It mainly focuses on the mpNoC flexible IP based design and its implementation on FPGA. The integration of mpNoC in mppSoC is also described. Implementation results on a Stratix II FPGA device are given for three data-parallel applications ran on mppSoC. The obtained good performances justify the effectiveness of the proposed parallel network. It is shown that the mpNoC is a lightweight parallel network making it suitable for both small as well as large FPGA-based parallel systems. 相似文献

16.

FPGA-based architecture for hardware compression/decompression of wide format images 总被引：1，自引：0，他引：1

M. Akil L. Perroton T. Grandpierre 《Journal of Real-Time Image Processing》2006,1(2):163-170

In this article, we present a popular lossless compression/decompression algorithm, GZIP, and the study to implement it on an FPGA-based architecture, the ADM-XRC board from ALPHA DATA parallel system ltd. The algorithm is lossless, and applied to “bi-level” images of large size (A0 format). It ensures a minimum compression rate for the images we are considering. It aims to decrease storage requirements and transfer times, which are critical for wide format printing systems. In a wide format document industry, raster data are most of time processed in an uncompressed format, in order to apply processing (P) before printing (p). An example of a copy chain is composed of scanner, set of processing operations, storage, link and printer. We propose to use a compressed format as the new data-flow representation to improve the performances of the printing system. For example, the compression (C) is applied as soon as the data are produced by the scanner, and decompression (D) is performed at the last stage, before printing. The set of processing is applied to compressed images. The proposed architecture for the compressor is based on a hash table and the decompressor is based on a parallel decoder of the Huffman codes. We implemented the proposed architecture for compression and decompression algorithms on FPGA Xilinx Virtex XCV 400. 相似文献

17.

并行可配置的HEVC熵编码的VLSI结构

路伟余宁梅南江涵王冬芳《计算机工程与应用》2014,(3):121-124,144

提出了一种并行的可配置HEVC熵编码的VLSI结构。通过对HEVC参考软件算法分析,针对HEVC中CABAC编码采用高度并行的语法元素处理方式,设计了针对CABAC中语法元素并行处理的硬件结构。同时采用可配置的PE-Array结构,在提高了吞吐率和计算效率的同时,平衡了VLSI设计中面积过大的问题。在SMIC 0.13μm工艺库下,进行了逻辑综合,系统总门数为16.2 K,片上存储为20.8 KB。在时钟频率300 MHz下,可处理3 840×2 160@30 frame/s的视频序列。相似文献

18.

Elastic superposition task mapping for NoC-based reconfigurable systems

《Microprocessors and Microsystems》2017

With technology progress, more and more applications are integrated into a single chip. This requires a large number of processing elements (PEs) in a system, such that computation can be effectively enhanced through parallel processing. To support more efficient parallel processing, the Network-on-Chip (NoC) is being increasingly adopted as an interconnection architecture. Nevertheless, for NoC-based reconfigurable systems, the issue of mapping tasks to the PEs becomes more complex, due to the characteristic of hardware reconfiguration. This work proposes a novel Elastic Superposition Mapping (ESM) that introduces a useful PE reservation heuristic along with dynamic cross-application superposition. The ESM can provide a great elasticity for an NoC-based reconfigurable system to map more applications. Thus, the task load on PE will increase. Experiments show that, compared to the state-of-the-art mapping methods, 7% to 49% more applications can be executed, the average task load on PE can be increased by 5.5% to 56%, and the application waiting time can be reduced by 11% to 54%. 相似文献

19.

A parallel processing VLSI BAM engine

Hasan S.M.R. Ng Kang Siong 《Neural Networks, IEEE Transactions on》1997,8(2):424-436

In this paper emerging parallel/distributed architectures are explored for the digital VLSI implementation of adaptive bidirectional associative memory (BAM) neural network. A single instruction stream many data stream (SIMD)-based parallel processing architecture, is developed for the adaptive BAM neural network, taking advantage of the inherent parallelism in BAM. This novel neural processor architecture is named the sliding feeder BAM array processor (SLiFBAM). The SLiFBAM processor can be viewed as a two-stroke neural processing engine, It has four operating modes: learn pattern, evaluate pattern, read weight, and write weight. Design of a SLiFBAM VLSI processor chip is also described. By using 2-mum scalable CMOS technology, a SLiFBAM processor chip with 4+4 neurons and eight modules of 256x5 bit local weight-storage SRAM, was integrated on a 6.9x7.4 mm(2) prototype die. The system architecture is highly flexible and modular, enabling the construction of larger BAM networks of up to 252 neurons using multiple SLiFBAM chips. 相似文献

20.

A context switching streaming memory architecture to accelerate a neocortex model

Christopher N. Vutsinas Tarek M. Taha Kenneth L. Rice 《Microprocessors and Microsystems》2009,33(2):117-128

A novel architecture to accelerate a neocortex inspired cognitive model is presented. The architecture utilizes a collection of context switchable processing elements (PEs). This enables time multiplexing of nodes in the model onto available PEs. A streaming memory system is designed to enable high-throughput computation and efficient use of memory resources. Several scheduling algorithms were examined to efficiently assign network nodes to the PEs. Multiple parallel FPGA-accelerated implementations were evaluated on a Cray XD1. Networks of varying complexity were tested and indicate that hardware acceleration can provide an average throughput gain of 184 times over equivalent parallel software implementations. 相似文献