期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Control generation in the design of processor arrays

Jürgen Teich Lothar Thiele 《The Journal of VLSI Signal Processing》1991,3(1-2):77-92

The problem of mapping algorithms onto regular arrays has received great attention in the past. Results are available on the mapping of regular algorithms onto systolic or wavefront arrays. On the other hand, many algorithms that can be implemented on parallel architectures are not completely regular but are composed of a set of regular subalgorithms. Recently, a class of configurable processor arrays has been proposed that allows the efficient implementation of piecewise regular algorithms. In contrary to pure systolic of wavefront arrays they are distinguished by a dynamic configuration structure. The known trajectories, however, cannot be applied to the design of configurable processor arrays because the functions of the procesing elements and the interconnection structure are time- and space-dependent. In this paper, a systematic procedure is introduced that allows the efficient design of configurable processor arrays including the specification of the processing elements and the generation of control signals. Control signals are propagated through the processor array. The proposed design trajectory can be used for the design of regular arrays or configurable arrays. 相似文献

2.

On supercomputing with systolic/wavefront array processors

《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》1984,72(7):867-884

In many scientific and signal processing applications, there are increasing demands for large-volume and/or high-speed computations which call for not only high-speed computing hardware, but also for novel approaches in computer architecture and software techniques in future supercomputers. Tremendous progress has been made on several promising parallel architectures for scientific computations, including a variety of digital filters, fast Fourier transform (FFT) processors, data-flow processors, systolic arrays, and wavefront arrays. This paper describes these computing networks in terms of signal-flow graphs (SFG) or data-flow graphs (DFG), and proposes a methodology of converting SFG computing networks into synchronous systolic arrays or data-driven wavefront arrays. Both one- and two-dimensional arrays are discussed theoretically, as well as with illustrative examples. A wavefront-oriented programming language, which describes the (parallel) data flow in systolic/wavefront-type arrays, is presented. The structural property of parallel recursive algorithms points to the feasibility of a Hierarchical Iterative Flow-Graph Design (HIFD) of VLSI Array Processors. The proposed array processor architectures, we believe, will have significant impact on the development of future supercomputers. 相似文献

3.

A modularized processor LSI with a highly parallel structure forcontinuous speech recognition

Takahashi J. Hamaguchi S. Tansho K. Kimura T. 《Solid-State Circuits, IEEE Journal of》1991,26(6):833-843

A speech recognition processor CMOS LSI was developed as the processing element (PE) of a ring array processor previously proposed by the authors as architecture to carry out highly parallel recognition processing with array size flexibility. There are three key features for the LSI: (1) a highly parallel I/O structure of triple buffer with cyclical-mode transition control methods to solve the serious problem of inter-PE data transfer overhead versus the array processing; (2) a control structure with two direct memory access (DMA) controllers to realize inter-PE data I/O processing and intra-PE processing in parallel; and (3) a pipelined recognition processing at a high execution rate realized by a pipelined structure and a balanced clock distribution design technique. These effective designs for the PE LSI allow high-speed recognition processing without any inter-PE data transfer overhead in the ring array processor. Combining the PE-LSI architecture with the proposed array architecture for highly parallel dynamic time warping (DTW) processing, a real-time continuous speech recognition system based on continuous dynamic programming matching using the SPLIT method for a 1000-word vocabulary, can be constructed using a ring array processor consisting of 30 PEs 相似文献

4.

Guest Editorial: Design and Implementation of DSP Systems

Warren J. Gross Zhiyuan Yan 《Journal of Signal Processing Systems》2016,82(1):1-16

Convolution has been extensively used in image processing and computer vision, including image enhancement, smoothing, and structure extraction. However, convolution operation typically requires a significant amount of computing resources. A novel one-dimensional (1D) convolution processor with reconfigurable architecture is implemented in this study. This processor is a combination of a line buffer, controller units, as well as a reconfigurable and separable convolution module. The use of a reconfigurable architecture and separable convolution approach improves the flexibility and performance of the convolution processor. The reconfigurable and separable convolution array, which is the main component of the processor, can simultaneously execute convolution operation with different kernels, with a maximum kernel size of up to 24 × 24. Experimental results show that the maximum frames rate of the processor is approximately 194 frames per second (fps), which exceeds the real-time requirement. Synthesis results show that the processor occupies 13.39 mm ² at a 204 MHz system clock and consumes a power of 419 mW at maximum kernel size at a 120 MHz system clock in SMIC 0.18 μm CMOS technology. Verification experiments on field programmable gate arrays (FPGAs) demonstrate that the processor is suitable for real-time image processing applications even for high-resolution images. 相似文献

5.

A programmable data acquisition and control system for magneticresonance imaging: application to mutually coupled surface coil arraysand temperature monitoring

Kelton J.R. Wright S.M. Magin R.L. 《IEEE transactions on bio-medical engineering》1991,38(6):608-613

A programmable data acquisition and control system was developed for use in conjunction with magnetic resonance imaging (MRI). The controller consists of two functional blocks: a host system and a remote system. The remote system resides inside the shielded room housing the magnet. The host, an IBM compatible personal computer, is located at the technician's console. Communication between these devices is implemented over a fiber optic RS-232 data link. This configuration allows experiments to be performed remotely by using a series of keyboard typed commands, by programming the host to send a series of commands, or by directly programming the remote system. As an example of its capabilities, the controller was used to tune and match arrays of receiver coils for localized imaging and to record the rectal temperature of a sedated rat during image acquisition. 相似文献

6.

Dynamic programming implementation on array processor architectures

K. I. Diamantaras W. H. Chou S. Y. Kung 《The Journal of VLSI Signal Processing》1996,13(1):27-35

Dynamic Programming (DP) applies to many signal and image processing applications including boundary following, the Viterbi algorithm, dynamic time warping, etc. This paper presents an array processor implementation of generic dynamic programming. Our architecture is a SIMD array attached to a host computer. The processing element of the architecture is based on an ASIC design opting for maximum speed-up. By adopting a torus interconnection network, a dual buffer structure, and a multilevel pipeline, the performance of the DP chip is expected to reach the order of several GOPS. The paper discusses both the dedicated hardware design and the data flow control of the DP chip and the total array.This work was supported in part by the NATO, Scientific and Environmental Affairs Division, Collaborative Research Grant SA.5-2-05(CRG.960201)424/96/JARC-501. 相似文献

7.

A dynamically reconfigurable interconnect for array processors

《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1998,6(1):150-157

Reconfigurability of processor arrays is important due to two reasons (1) to efficiently execute different algorithms and (2) to isolate faulty processors. An array processor that is reconfigurable by the user any number of times to yield a different topology or to isolate faults is envisaged in this paper. The system has a host or controller that broadcasts a command to the interconnect to configure itself into a particular fashion. The interconnect uses static-RAM programming technology and can be programmed to different configurations by sending a different set of bits to the configuration random access memory (RAM) in the interconnect. We present three designs reconfigurable into array, ring, mesh, or Illiac mesh topologies. The first design provides no redundancy or fault tolerance. The second design is capable of graceful degradation by bypassing faulty elements. The third design is capable of graceful degradation by rerouting. The details of the interconnect and the configuration RAM contents for typical configurations are illustrated. It is seen that reconfigurable interconnect results in a highly reconfigurable or polymorphic computer 相似文献

8.

LS MPP嵌入式计算机的系统软件设计

王忠李俊山沈绪榜王晖《微电子学与计算机》2002,19(7):5-9

LS MPP是西安微电子技术研究所自行研制出的面向航空嵌入式应用的大规模并行图象处理机。其宿主机为自行研制的32位浮点RISC芯片，图象协处理系统为自行研制的MPP协处理器。文章论述了LS MPP计算机的系统软件设计，包括汇编程序、监控程序和C编译程序。相似文献

9.

Seasat Synthetic-Aperture Radar Data Reduction Using Parallel Programmable Array Processors 总被引：1，自引：0，他引：1

Wu Chialin Barkan Budak Karplus Walter J. Caswell Dennis 《Geoscience and Remote Sensing, IEEE Transactions on》1982,(3):352-358

This paper presents a digital signal processing system that produces the SEASAT synthetic-aperture radar (SAR) imagery. The system consists of a SEL 32/77 host minicomputer and three AP-120B array processors. The partitioning of the SAR processing functions and the design of softwae modules is described. The rationale for selecting the parallel array processor architecture and the methodology for developing the parallel processing scheme on this system is described. This system attains a SEASAT SAR data reduction speed of 2.5 h per 25-m resolution 4-look and 100 km X 100 km image frame. A prelininary performance evaluation of this parallel processing system and potential future applications for remote sensing data reduction are described. 相似文献

10.

Demonstration of a c.c.d. image processor for two-dimensional edge detection

Nudd G.R. Nygaard P.A. 《Electronics letters》1978,14(4):83-85

An analog technique for image processing that provides 2-dimensional edge detection is described. The algorithm implemented is the Sobel operator using a 3 × 3 array of picture elements. The implementation is unique in that a fully integrated circuit approach is employed which has the potential of being integrated directly into either an optical or infrared sensor. The n channel charge-coupled device (c.c.d.) technology used results in a processor area of less than 0.06 cm2, and can provide an increase in processing speeds of several orders of magnitude over general-purpose machines. Typical results from this processor are given and are compared with computer simulation. 相似文献

11.

Parallel architectures for digital optical cellular imageprocessing

Kung-Shiuh Huang Kuznia C.B. Jenkins B.K. Sawchuk A.A. 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》1994,82(11):1711-1723

A parallel digital optical cellular image processor (DOCIP) functionally comprises an array of identical I-bit processing elements or cells, a fixed interconnection network, and a control unit. Four interconnection network topologies are described, and include two variants of a mesh-connected array and two variants of a cellular hypercube network. The instruction sets of these single-instruction multiple-data (SIMD) machines are based on a mathematical morphological theory, binary image algebra (BIA), which provide an inherently parallel programming structure for their control. Physically, a DOCIP architecture uses a holographic optical element in a 3D free-space optical system to implement off-chip interconnections, and an optoelectronic spatial light modulator to implement a 2D array of nonlinear processing elements and (optionally) local on-chip interconnections. Two examples are given. The first, an experimental implementation of a single 54-gate cell of the DOCIP, uses an optically recorded hologram for within-cell optical interconnections, and a spatial light modulator for a 2D array of optically accessible gates. The second, a design for an efficient and more manufacturable architecture, uses a computer-generated diffractive optical element for cell-to-cell interconnections, and a 20 smart-pixel array of DOCIP cells, each cell having electronic logic and optical input/output 相似文献

12.

3D EM/MPM Image Segmentation Using an FPGA Embedded Design Implementation

Liu Chao Sun Yan Christopher Lauren 《Journal of Signal Processing Systems》2015,81(3):411-424

This paper presents a Field Programmable Gate Array (FPGA) based embedded system which is used to achieve high speed segmentation of 3D images. Segmentation is performed using Expectation-Maximization (EM) with Maximization of Posterior Marginals (MPM) Bayesian algorithm. This algorithm segments the 3D image using neighboring pixels based on a Markov Random Field (MRF) model. In this system, the embedded processor controls a custom circuit which performs the MPM and portions of the EM algorithm. The embedded processor completes the EM algorithm and also controls image data transmission between host computer and on-board memory. The whole system has been implemented on Xilinx Virtex 6 FPGA and achieved over 100 times processing improvement compared to standard desktop computer. Three new techniques were the key to achieve this speed: Pipelined computational cores, sixteen parallel data paths and a novel memory interface for maximizing the external memory bandwidth.

相似文献

13.

Systolic architectures for radar CFAR detectors

Hwang J.-N. Ritcey J.A. 《Signal Processing, IEEE Transactions on》1991,39(10):2286-2295

The authors discuss several advances in the evolution of radar CFAR (constant false alarm rate) detectors, from the classical mean-level detector to more recent designs using order statistics, or sorted data values. These algorithms can be implemented by modifying the existing running window order statistic filtering techniques used in signal/image processing. Although the signal processing theory of CFAR detection is well advanced, practical applications lag because of the high throughput required in radar. This intensive computational requirement is unlikely to be met by further advances in VLSI technology alone; it must result from parallel processing techniques. Systolic array architectures are proposed for several important CFAR detectors. Techniques for improving the processor utilization efficiency of the proposed array architectures are also discussed 相似文献

14.

基于PCI总线的电视图像处理仿真系统设计

董雪峰乐丽琴王勇《现代电子技术》2010,33(20):117-119

为便于科研人员在电视图像处理系统设计过程中对图像处理的新算法进行评估和测试,降低评估测试板硬件电路的设计复杂性,在此提出了解决方案,并实现了基于PCI总线的电视图像实时仿真系统的设计。该系统首先通过PCI插卡实现对电视图像进行采集、预处理和视频A／D转换,然后选用具有高速特性的PCI总线将数字化后的数字图像信息写入计算机系统内存,最后在计算机终端上,使用高级语言编程,完成图像处理和控制接口软件开发,实现计算机软件对PCI硬件设备的访问,数字图像的实时处理、分割、匹配等算法仿真。相似文献

15.

A general-purpose processor-per-pixel analog SIMD vision chip

Dudek P. Hicks P.J. 《IEEE transactions on circuits and systems. I, Regular papers》2005,52(1):13-20

A smart-sensor VLSI circuit suitable for focal-plane low-level image processing applications is presented. The architecture of the device is based on a fine-grain software-programmable SIMD processor array. Processing elements, integrated within each pixel of the imager, are implemented utilising a switched-current analog microprocessor concept. This allows the achievement of real-time image processing speeds with high efficiency in terms of silicon area and power dissipation. The prototype 21 /spl times/ 21 vision chip is fabricated in a 0.6 /spl mu/m CMOS technology and achieves a cell size of 98.6 /spl mu/m /spl times/ 98.6 /spl mu/m. It executes over 1.1 giga instructions per second (GIPS) while dissipating under 40 mW of power. The architecture, circuit design and experimental results are presented in this paper. 相似文献

16.

Distributed Memory Parallel Architecture Based on Modular Linear Arrays for 2-D Separable Transforms Computation

José Fridman Elias S. Manolakos 《The Journal of VLSI Signal Processing》2001,28(3):187-203

A framework for mapping systematically 2-dimensional (2-D) separable transforms into a parallel architecture consisting of fully pipelined linear array stages is presented. The resulting model architecture is characterized by its generality, high degree of modularity, high throughput, and the exclusive use of distributed memory and control. There is no central shared memory block to facilitate the transposition of intermediate results, as it is commonly the case in row-column image processing architectures. Avoiding shared central memory has positive implications for speed, area, power dissipation and scalability of the architecture. The architecture presented here may be used to realize any separable 2-D transform by only changing the coefficients stored in the processing elements. Pipelined linear arrays for computing the 2-D Discrete Fourier Transform and 2-D separable convolution are presented as examples and their performance is evaluated. 相似文献

17.

A 1.0-GHz single-issue 64-bit powerPC integer processor

Silberman J. Aoki N. Boerstler D. Burns J.L. Sang Dhong Essbaum A. Ghoshal U. Heidel D. Hofstee P. Kyung Tek Lee Meltzer D. Hung Ngo Nowka K. Posluszny S. Takahashi O. Vo I. Zoric B. 《Solid-State Circuits, IEEE Journal of》1998,33(11):1600-1608

The organization and circuit design of a 1.0 GHz integer processor built in 0.25 μm CMOS technology are presented, a microarchitecture emphasizing parallel computation with a single late select per cycle, structured control logic implemented by read-only-memories and programmable logic arrays, and a delayed reset dynamic circuit style enabling complex functions to be implemented in a few levels of logic are among the key design choices described. A means for at-speed scan testing of this high-frequency processor by a low-speed tester is also presented 相似文献

18.

Computer architectures for image processing in the USA

Anthony P. Reeves 《Signal processing》1981,3(3):217-230

Innovative computer architectures have been developed to meet the high speed computation needs of image processing applications. These architectures take advantage of the highly structured characteristics of image data and image processing algorithms. In this paper several alternative architectures for image processing are discussed and some current hardware development projects in the USA are described. Two established computer architectures for image processing are highly parallel binary array processing and pipeline processing. Current designs involving these architectures make good use of Large Scale Integration technology. New emerging architectures, which are still in the development stage, include multi-microprocessor systems and analog processors based on charge coupled devices. 相似文献

19.

嵌入式车牌识别系统的硬件电路设计

张松王飞《现代电子技术》2012,35(10):1-3

基于数字信号处理器(DSP)TMS320VC5416和复杂可编程逻辑器件(CPLD)的嵌入式车牌识别系统的硬件设计,利用视频处理芯片SAA7111作为视频A/D,在CPLD的控制下将采集到的图像数据写入帧存储器中,DSP对图像数据进行实时分析处理。采用"乒乓"存储结构,实现了图像数据的采集和处理的并行运行。识别结果通过串口传到上位机或者保存在E2PROM中,实现了车牌识别系统脱机、联机工作,在实时高速图像处理系统中有广泛的工程技术应用前景。相似文献

20.

基于FPGA的高速采样单元实现

王健《电子科技》2012,25(8):49-51,58

介绍了一种基于FPGA的高速采样单元硬件实现,包括数据采集器周边电路设计、高速数据传输方法和设计要点、运算处理单元设计、总线控制设计和VHDL程序编写框架。将信号进行样式转换,由采样器转换并通过可编程门阵列FPGA进行处理并存储,再由系统进行控制完成整个采样单元的数据传输。相似文献