首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The first half of this paper presents the design rationale for CNAPS, a specialized one-dimensional (1-D) processor array developed by Adaptive Solutions Inc. In this context, we discuss the problem of Amdahl's law which severely constrains special-purpose architectures. We also discuss specific architectural decisions such as the kind of parallelism, the computational precision of the processors, on-chip versus off-chip processor memory, and-most importantly-the interprocessor communication architecture. We argue that, for our particular set of applications, a 1-D architecture gives the best “bang for the buck”, even when compared to the more traditional two-dimensional (2-D) architecture. The second half of this paper describes how several simple algorithms map to the CNAPS array. Our results show that the CNAPS 1-D array offers excellent performance over a range of IP algorithms. We also briefly look at the performance of CNAPS as a pattern recognition engine because many image processing and pattern recognition problems are intimately related  相似文献   

2.
The discrete cosine transform (DCT) and the discrete sine transform (DST) have found wide applications in speech and image processing, as well as telecommunication signal processing for the purpose of data compression, feature extraction, image reconstruction, and filtering. In this paper, we present new recursive algorithms for the DCT and the DST. The proposed method is based on certain recursive properties of the DCT coefficient matrix, and can be generalized to design recursive algorithms for the 2-D DCT and the 2-D DST. These new structured recursive algorithms are able to decompose the DCT and the DST into two balanced lower-order subproblems in comparison to previous research works. Therefore, when converting our algorithms into hardware implementations, we require fewer hardware components than other recursive algorithms. Finally, we propose two parallel algorithms for accelerating the computation  相似文献   

3.
In this paper we examine the usefulness of a simple memory array architecture to several image processing tasks. This architecture, called theAccess Constrained Memory Array Architecture (ACMAA) has a linear array of processors which concurrently access distinct rows or columns of an array of memory modules. We have developed several parallel image processing algorithms for this architecture. All the algorithms presented in this paper achieve a linear speed-up over the corresponding fast sequential algorithms. This was made possible by exploiting the efficient local as well as global communication capabilities of the ACMAA.  相似文献   

4.
This paper presents a digital signal processing system that produces the SEASAT synthetic-aperture radar (SAR) imagery. The system consists of a SEL 32/77 host minicomputer and three AP-120B array processors. The partitioning of the SAR processing functions and the design of softwae modules is described. The rationale for selecting the parallel array processor architecture and the methodology for developing the parallel processing scheme on this system is described. This system attains a SEASAT SAR data reduction speed of 2.5 h per 25-m resolution 4-look and 100 km X 100 km image frame. A prelininary performance evaluation of this parallel processing system and potential future applications for remote sensing data reduction are described.  相似文献   

5.
6.
The performance of computation-intensive digital signal processing applications running on parallel systems is highly dependent on communication delays imposed by the parallel architecture. In order to obtain a more compact task/processor assignment, a scheduling algorithm considering the communication time between processors needs to be investigated. Such applications usually contain iterative or recursive segments that are modeled as communication sensitive data flow graphs (CS-DFGs), where nodes represent computational tasks and edges represent dependencies between them. Based on the theorems derived, this paper presents a novel efficient technique called cyclo-compaction scheduling, which is applied to a CS-DFG to obtain a better schedule. This new method takes into account the data transmission time, loop carried dependencies, and the target architecture. It implicitly uses the retiming technique (loop pipelining) and a task remapping procedure to allocate processors and to iteratively improve the parallelism while handling the underlying communication and resource constraints. Experimental results on different architectures demonstrate that this algorithm yields significant improvement over existing methods. For some applications, the final schedule length is less than one half of its initial length  相似文献   

7.
Real-time image processing usually requires an enormous throughput rate and a huge number of operations. Parallel processing, in the form of specialized hardware, or multiprocessing are therefore indispensable. This piper describes a flexible programmable image processing system using the field programmable gate array (FPGA). The logic cell nature of currently available FPGA is most suitable for performing real-time bit-level image processing operations using the bit-level systolic concept. Here, we propose a novel architecture, the programmable image processing system (PIPS), for the integration of these programmable hardware and digital signal processors (DSPs) to handle the bit-level as well as the arithmetic operations found in many image processing applications. The versatility of the system is demonstrated by the implementation of a 1-D median filter.  相似文献   

8.
The authors present a general system design method which is intended to support parallelisation of complete image processing applications using MIMD processors. The approach is based upon the utilisation of a generic system level parallel processor architecture, the `pipeline processor farm'(PPF), and is applicable to any embedded application with continuous input/output. The design method is illustrated using applications from the fields of computer vision and image coding. The design model accommodates several commonly exploited parallel processing paradigms, maps conveniently to the software structure of most image processing algorithms, provides incrementally scalable performance, and enables upper-bound speedups to be easily estimated from profiling data generated by the original sequential implementation of the application. It is believed that the approach has significant application in parallel embedded systems design, in the development environment, and in simulation work for computationally intensive image coding algorithms  相似文献   

9.
This paper presents architectures for supporting dynamic data scaling in pipeline fast Fourier transforms (FFTs), suitable when implementing large size FFTs in applications such as digital video broadcasting and digital holographic imaging. In a pipeline FFT, data is continuously streaming and must, hence, be scaled without stalling the dataflow. We propose a hybrid floating-point scheme with tailored exponent datapath, and a co-optimized architecture between hybrid floating point and block floating point (BFP) to reduce memory requirements for 2-D signal processing. The presented co-optimization generates a higher signal-to-quantization-noise ratio and requires less memory than for instance convergent BFP. A 2048-point pipeline FFT has been fabricated in a standard-CMOS process from AMI Semiconductor (Lenart and Owall, 2003), and a field-programmable gate array prototype integrating a 2-D FFT core in a larger design shows that the architecture is suitable for image reconstruction in digital holographic imaging  相似文献   

10.
In many one-dimensional (1-D) and two-dimensional (2-D) digital signal processing applications, auto-regressive (AR) models are very useful and powerful tools. Most of the development work done so far in 2-D AR modeling was limited to causal models. Recently, noncausal models have generated a great deal of interest because these models are a more natural choice for many applications in image processing. In this paper, we generalize the 1-D problems of noncausal linear-phase signal mdoeling and system modeling to their 2-D counterparts. The purpose of this paper is twofold. First, for homogeneous random fields, we introduce and investigate the 2-D symmetric (zero-phase) noncausal AR signal and system modeling problems. We then develop two computationally efficient algorithms for the determination of model parameters. Finally, we investigate an application in stochastic texture modeling and provide experimental results.  相似文献   

11.
Novel algorithmic features of multimedia applications and advances in VLSI technologies are driving forces behind the new multimedia signal processors. We propose an architecture platform which could provide high performance and flexibility, and would require less external I/O and memory access. It is comprised of array processors to be used as the hardware accelerator and RISC cores to be used as the basis of the programmable processor. It is a hierarchical and scalable architecture style which facilitates the hardware-software codesign of multimedia signal processing circuits and systems. While some control-intensive functions can be implemented using programmable CPUs, other computation-intensive functions can rely on hardware accelerators.To compile multimedia algorithms, we also present an operation placement and scheduling scheme suitable for the proposed architectural platform. Our scheme addresses data reusability and exploits local communication in order to avoid the memory/communication bandwidth bottleneck, which leads to faster program execution. Our method shows a promising performance: a linear speed-up of 16 times can be achieved for the block-matching motion estimation algorithm and the true motion tracking algorithm, which have formed many multimedia applications (e.g., MPEG-2 and MPEG-4).  相似文献   

12.
This survey paper reviews numerous high-level transformation techniques which can be applied at the algorithm or the architecture level to improve the performance of digital signal and image processing architectures and circuits implemented using VLSI technology. Successful design of VLSI signal and image processors requires careful selection of algorithms, architectures, implementation styles, and synthesis techniques. High-level transformations can play an important role in reducing silicon area or power at the same speed or in increasing the speed for same area. These transformations can also increase the suitability of an algorithm for a particular architectural style. The transformation techniques reviewed in this paper include pipelining, parallel processing, retiming, unfolding, folding, look-ahead, relaxed look-ahead, associativity, distributivity, and reduction in strength.  相似文献   

13.
A formalism and an algorithm for configuring and sequencing parallel to massively parallel processors for the application of generalised spectral analysis transforms are presented. Successive partial rotations of a base-p hypercube, where p is an arbitrary integer, are shown to produce dynamic contention-free memory allocation, in a generalised parallelism processor architecture. The approach is illustrated by factorisations involving the processing of matrices of transforms which are functions of four variables. Parallel operations are implemented as matrix multiplications. Each matrix, of dimension N /spl times/ N, where N = p/sup n/, n integer, has a structure that depends on a variable parameter k. The level of parallelism, in the form of M = p/sup m/ processors, can be chosen arbitrarily by varying m between zero and its maximum value of n - 1. The result is an equation describing the generalised parallelism factorisation as a function of the four variables n, p, k and m. Applications of the approach are shown in relation to complex matrix structures of image processing generalised spectral analysis transforms. The same approach can be applied to a much larger class of parallel and multiprocessing systems for digital signal processing applications.  相似文献   

14.
This paper introduces a tightly coupled topographic sensor-processor and digital signal processor (DSP) architecture for real-time visual multitarget tracking (MTT) applications. We define real-time visual MTT as the task of tracking targets contained in an input image flow at a sampling-rate that is higher than the speed of the fastest maneuvers that the targets make. We utilize a sensor-processor based on the cellular neural network universal machine architecture that permits the offloading of the main image processing tasks from the DSP and introduces opportunities for sensor adaptation based on the tracking performance feedback from the DSP. To achieve robustness, the image processing algorithms running on the sensor borrow ideas from biological systems: the input is processed in different parallel channels (spatial, spatio-temporal and temporal) and the interaction of these channels generates the measurements for the digital tracking algorithms. These algorithms (running on the DSP) are responsible for distance calculation, state estimation, data association and track maintenance. The performance of the proposed system is studied using actual hardware for different video flows containing rapidly moving maneuvering targets.  相似文献   

15.
The general-purpose, highly parallel, cellular array processor (CAP) we developed features multiple-instruction stream, multiple-data stream (MIMD) processing and image display. Processor elements can number in several hundreds. The present system uses 256 processors. Each processor element consists of a general-purpose microprocessor, memory, and a special VLSI chip that performs parallel-processing-specific functions such as processor communication and synchronization. The VLSI has two 2M byte/s independent common bus interfaces for data broadcating and six 15M bit/s serial communication ports for local data communication. The chip also can process image data in real time for multiple processors. Use of the communication interfaces enables a variety of processor networks to be configured. One CAP application has been computer graphics, in which ray tracing is used to generate quality images.  相似文献   

16.
This paper provides a tutorial on the motivations, design, and applications of parallel processing applied to video real-time, illustrated by the experience gained in the implementation of the P3I machine. Its main purpose is to highlight the motivations for such a development the basic implementation choices, the major difficulties encountered and how they have been solved. Through these studies we found that parallel processing is well-suited to video real-time, when programmable implementations are considered. There are many outcomes of the P3I project, ranging from architectural considerations to parallel algorithms optimizations, and programming methodology. We want to emphasize three conclusions. First, programming an architecture composed of different parallel paradigms in a given architecture is tractable, and this heterogeneity is cost effective and efficient in terms of processing performances. Second, concerning the well known debate about how to match parallel architectures and image processing “levels” we conclude that the key is not to discuss Flynn's taxonomy (i.e., data versus tasks parallelism) but to consider how the parallelism grain evolves within a whole application. Third, we confirm that in the field of image processing, the efficiency of parallelism can only be gained if algorithms developers think “parallel”; this result seems to be obvious, but just consider the trends of recent RISC processors, embedding more and more parallelism, and claiming at a compatibility with existing sequential softwares  相似文献   

17.
在面向多媒体数据流的计算密集型的应用中,不仅要求DSP(数字信号处理器)有非常强大的数据处理能力,还要求其具有高速的数据输入、输出接口带宽。本文在传统DSP常用的增强型哈佛结构的基础上,提出一种DSP处理器DMA接口结构的设计方案.实现了基于指令并行和任务并行的DMA并行传输模式。通过6个常用的DSP算法程序实验验证.在片上存储器使用单口RAM的前提下,指令中带有片上Memory访存操作的指令占总指令的42.2%-94.3%时.这种方法设计的。DMA接口能够在DSP零开销的情况下,完成必要的数据传输。而且能够实现对Host处理器程序员透明的。DMA数据传输操作.有效地提高了DSP系统的性能。  相似文献   

18.
Architectures, algorithms, and applications for systolic processors are described with attention to the realization of parallel algorithms on various optical systolic array processors. Systolic processors for matrices with special structure and matrices of general structure, and the realization of matrix-vector, matrix-matrix, and triple-matrix products and such architectures are described. Parallel algorithms for direct and indirect solutions to systems of linear algebraic equations and their implementation on optical systolic processors are detailed with attention to the pipelining and flow of data and operations. Parallel algorithms and their optical realization for LU and QR matrix decomposition are specifically detailed. These represent the fundamental operations necessary in the implementation of least squares, eigenvalue, and SVD solutions. Specific applications (e.g., the solution of partial differential equations, adaptive noise cancellation, and optimal control) are described to typify the use of matrix processors in modern advanced signal processing.  相似文献   

19.
Some of the advances that have contributed to the realization of communication systems with programmable digital signal processors (DSPs) are described. DSP architectures are examined, covering the performance improvements resulting from advances in circuitry as well as in architecture. Architectural advances discussed are parallel processing, function generation, integrating analog circuits, and sigma-delta analog-to-digital conversion. DSP applications to high-speed modems, trellis-coded modulation, and echo cancellation are examined. The DSP implementation of a V.32 modem is described  相似文献   

20.
This paper begins with a discussion of the characteristics of digital signal processing, which are the driving force behind the design of digital signal processors. The remainder of the paper describes the three generations of the TMS320 family of digital signal processors available from Texas Instruments. The evolution in architectural design of these processors and key features of each generation of processors are discussed. More detailed information is provided for the TMS320C25 and TMS320C30, the newest members in the family. The benefits and cost-performance tradeoffs of these processors become obvious when applied to digital signal processing applications, such as telecommunications, data communications, graphics/image processing, etc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号