首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
王超  曹鹏  李杰  黄伟达 《现代电子技术》2007,30(14):114-118
离散小波变换(Discrete Wavelet Transform,DWT)需要较多的运算量以及较大的存储器空间,为了使之适用于实时的图像处理应用,就需要开发特殊的架构和芯片来提高离散小波变换的运算性能。基于提升的二维DWT提出了一种新型的VLSI结构——LLSP架构,其结合逐级和基于行的架构这两者特点,带来了硬件开销和存储器空间的降低,并可以用于多提升步骤的扩展以及多级二维离散小波变换。  相似文献   

2.
Discrete Wavelet Transform: Architectures, Design and Performance Issues   总被引:3,自引:0,他引:3  
Due to the demand for real time wavelet processors in applications such as video compression [1], Internet communications compression [2], object recognition [3], and numerical analysis, many architectures for the Discrete Wavelet Transform (DWT) systems have been proposed. This paper surveys the different approaches to designing DWT architectures. The types of architectures depend on whether the application is 1-D, 2-D, or 3-D, as well as the style of architecture: systolic, semi-systolic, folded, digit-serial, etc. This paper presents an overview and evaluation of the architectures based on the criteria of latency, control, area, memory, and number of multipliers and adders. This paper will give the reader an indication of the advantages and disadvantages of each design.  相似文献   

3.
In this paper, we propose an efficient pipeline architecture for the DWT 9/7 filter defined in JPEG 2000. The proposed architecture is composed of column and row processors to perform the separable 2-D DWT. Based on the rescheduling DWT algorithm, we derive a new data flow graph to shorten the critical path. The proposed 1-D column processor requires less pipeline registers to achieve about the same critical path compared with other lifting-based architectures. For the row processor, the data dependency of each lifting step is reduced to only two computation nodes and therefore more pipeline registers can be applied to achieve higher processing speed without increasing the internal memory size in the 2-D case. That is, for an N × N image, it only requires 4N internal memory to perform the row-wise transform. For the memory bit-width analysis, we use software simulation to reduce the memory bit-width for various compression ratios. Since a portion of information from least significant bits of DWT coefficients would be discarded after EBCOT-tier2 processing, one can decrease the data width of internal memory to perform various compression ratios of JPEG 2000 coding, especially at the low-bit rates. Our simulation results suggest that it is practically possible to design the energy-aware memory architecture to further reduce the power consumption in the future work.  相似文献   

4.
Efficient architectures for 1-D and 2-D lifting-based wavelet transforms   总被引:4,自引:0,他引:4  
The lifting scheme reduces the computational complexity of the discrete wavelet transform (DWT) by factoring the wavelet filters into cascades of simple lifting steps that process the input samples in pairs. We propose four compact and efficient hardware architectures for implementing lifting-based DWTs, namely, one-dimensional (1-D) and two-dimensional (2-D) versions of what we call recursive and dual scan architectures. The 1-D recursive architecture exploits interdependencies among the wavelet coefficients by interleaving, on alternate clock cycles using the same datapath hardware, the calculation of higher order coefficients along with that of the first-stage coefficients. The resulting hardware utilization exceeds 90% in the typical case of a five-stage 1-D DWT operating on 1024 samples. The 1-D dual scan architecture achieves 100% datapath hardware utilization by processing two independent data streams together using shared functional blocks. The recursive and dual scan architectures can be readily extended to the 2-D case. The 2-D recursive architecture is roughly 25% faster than conventional implementations, and it requires a buffer that stores only a few rows of the data array instead of a fixed fraction (typically 25% or more) of the entire array. The 2-D dual scan architecture processes the column and row transforms simultaneously, and the memory buffer size is comparable to existing architectures.  相似文献   

5.
This paper presents a novel unified and programmable 2-D Discrete Wavelet Transform (DWT) system architecture, which was implemented using a Field Programmable Gate Array (FPGA)-based Nios II soft-core processor working in combination with custom hardware accelerators generated through high-level synthesis. The proposed system architecture, synthesized on an Altera DE3 Stratix III FPGA board, was developed through an iterative design space exploration methodology using Altera’s C2H compiler. Experimental results show that the proposed system architecture is capable of real-time video processing performance for grayscale image resolutions of up to 1920?×?1080 (1080p) when ran on the Altera DE3 board, and it outperforms the existing 2-D DWT architecture implementations known in literature by a considerable margin in terms of throughput. While the proposed 2-D DWT system architecture satisfies real-time performance constraints, it can also perform both forward and inverse DWT, support a number of popular DWT filters used for image and video compression and provide architecture programmability in terms of number of levels of decomposition as well as image width and height. Based from the design principles used to implement the proposed 2-D DWT system architecture, a system design guideline can be formulated for SOC designs which plan to incorporate dedicated 2-D DWT hardware acceleration.  相似文献   

6.
This paper investigates efficient hardware architectures for implementation of 1-D and 2-D discrete wavelet transforms (DWTs). The architectures are based on the lifting scheme. We propose a general structure to minimize the number of multipliers and adders for 1-D DWTs. Compared to previous conventional architectures, the architecture presented here is more efficient in terms of the required arithmetic units. Moreover, we describe a new frame scan method for a block-based 2-D DWT structure which provides a flexible trade-off between the required internal memory size and external memory access. In contrast, other 2-D DWT structures require a fixed memory size.  相似文献   

7.
郭欣  王超  曹鹏  陆燕   《电子器件》2007,30(5):1708-1711
离散小波变换在图像压缩处理中有着重要的作用,并得到了广泛的应用.与传统的基于卷积的架构相比较,基于提升的架构具有需要较少的硬件资源,占用较少的芯片面积等优点.在DSP Builder中实现了基于提升的一维离散小波变换,并通过构造相关的存储器控制逻辑,完成了二维离散小波变换架构的设计.利用该架构对图像进行离散小波变换,与软件变换的结果相比较,并计算出图像的峰值信噪比,验证了其正确性.  相似文献   

8.
Novel architectures for 1-D and 2-D discrete wavelet transform (DWT) by using lifting schemes are presented in this paper. An embedded decimation technique is exploited to optimize the architecture for 1-D DWT, which is designed to receive an input and generate an output with the low- and high-frequency components of original data being available alternately. Based on this 1-D DWT architecture, an efficient line-based architecture for 2-D DWT is further proposed by employing parallel and pipeline techniques, which is mainly composed of two horizontal filter modules and one vertical filter module, working in parallel and pipeline fashion with 100% hardware utilization. This 2-D architecture is called fast architecture (FA) that can perform J levels of decomposition for N * N image in approximately 2N2(1 - 4(-J))/3 internal clock cycles. Moreover, another efficient generic line-based 2-D architecture is proposed by exploiting the parallelism among four subband transforms in lifting-based 2-D DWT, which can perform J levels of decomposition for N * N image in approximately N2(1 - 4(-J))/3 internal clock cycles; hence, it is called high-speed architecture. The throughput rate of the latter is increased by two times when comparing with the former 2-D architecture, but only less additional hardware cost is added. Compared with the works reported in previous literature, the proposed architectures for 2-D DWT are efficient alternatives in tradeoff among hardware cost, throughput rate, output latency and control complexity, etc.  相似文献   

9.
Scalable Parallel Memory Architectures for Video Coding   总被引:1,自引:0,他引:1  
Current video compression standards, which process frames macroblock by macroblock, employ several processing functions to achieve the compression. These functions refer to data memory address space in different ways. E.g., performing motion estimation and motion compensation functions requires many times data accesses unaligned to word boundaries. On the other hand, Discrete Cosine Transformation (DCT) and inverse of it (IDCT) for 8 × 8 block can be performed first for rows and then for columns. Thus, transposition is needed between these two stages. Among other things, parallel memory architecture can provide a solution for these tasks. In our other paper, we shortly surveyed parallel memory architectures and proposed parallel memory architecture designs for different data path widths for video coding applications. In this paper, we construct video coding function examples by using the proposed parallel data memory efficiently. Furthermore, performance and implementation cost of the parallel memory architecture are estimated and compared to more conventional memory architectures. The examples are given for different data bus widths (16, 32, 64, and 128 bits). We show that the parallel memory can keep the data path fully utilized in many video coding function implementations. This ensures high-speed operation and full utilization of the processing resources.  相似文献   

10.
A Vlsi Architecture for Separable 2-D Discrete Wavelet Transform   总被引:2,自引:0,他引:2  
In this paper, an efficient semi-systolic array architecture for separable 2-D Discrete Wavelet Transform (DWT) is introduced. The semi-systolic array is applicable to any convolution that requires an arbitrary subsampling function. The semi-systolic array presents a better implementation of the convolution function of DWT. This kind of implementation offers a higher efficiency compared to regular systolic implementation when applied for 2-D DWT. The architecture has an efficiency of at least 91% which increases proportional to the number of octaves with no change in the architecture design except for minor modifications to the control logic and memory size. The propose architecture is scalable for different size of filter and different number of octave. The communication routing is minimum since data transfers are limited to immediate neighboring processors. The components of the architecture are fairly regular and consist of minimum number of computational units which makes it a good candidate for VLSI implementation.  相似文献   

11.
For visual processing applications, the two-dimensional (2-D) Discrete Wavelet Transform (DWT) can be used to decompose an image into four-subband images. However, when a single band is required for a specific application, the four-band decomposition demands a huge complexity and transpose time. This work presents a fast algorithm, namely 2-D Symmetric Mask-based Discrete Wavelet Transform (SMDWT), to address some critical issues of the 2-D DWT. Unlike the traditional DWT involving dependent decompositions, the SMDWT itself is subband processing independent, which can significantly reduce complexity. Moreover, DWT cannot directly obtain target subbands as mentioned, which leads to an extra wasting in transpose memory, critical path, and operation time. These problems can be fully improved with the proposed SMDWT. Nowadays, many applications employ DWT as the core transformation approach, the problems indicated above have motivated researchers to develop lower complexity schemes for DWT. The proposed SMDWT has been proved as a highly efficient and independent processing to yield target subbands, which can be applied to real-time visual applications, such as moving object detection and tracking, texture segmentation, image/video compression, and any possible DWT-based applications.  相似文献   

12.
针对JPEG2000硬件实现中小波变换与编码之间占用大量存储的问题,该文提出一种基于码块的存储方案。通过对码块大小片内存储最大程度的复用以及对其高效简单的调度控制,从面积和功耗两方面减小了硬件实现的开销。在实现中,采用基于行的提升变换结构和比特平面并行的编码方式,提高了效率,确保整个过程的实时处理。实验结果表明:在实时编码要求下,对分辨率为512512的图像分片进行四级9/7或者5/3小波分解,码块大小为3232,采用本文结构所用的存储量与直接使用外部存储器的方法相比可减少80%以上。整个结构已通过FPGA验证,且系统时钟可以工作在100MHz。  相似文献   

13.
The in-loop deblocking filter is one of the complex parts in H.264/AVC. It has such a large amount of computation that almost all the pixels in all the frames are involved in the worst case. In this paper, a fast deblocking filter architecture is proposed, and it can effectively save the operating time. In the proposed architecture, two 1-D filters are introduced so that the vertical filtering and the horizontal filtering can be performed at the same time, Only 120 cycles are needed for a macroblock. Our architecture is also a memory efficient one, and only one 4×4 pixels register, one 4×4 transpose array and one 16×32 b two-port (SRAM) are used as buffers in the filtering process. The simulation and synthesis results show that, with almost the same or even smaller area than some 1-D filter based architectures before, the proposed one can save more than 40% processing time. The architecture is suitable for real-time applications and can easily achieve the requirement of processing real-time video in 1080HD (high definition format, 1 920×1 088@30 fps) at 100 MHz.  相似文献   

14.
We perform a thorough data dependence and localization analysis for the discrete wavelet transform algorithm and then use it to synthesize distributed memory and control architectures for its parallel computation. The discrete wavelet transform (DWT) is characterized by a nonuniform data dependence structure owing to the decimation operation it is neither a uniform recurrence equation (URE) nor an affine recurrence equation (ARE) and consequently cannot be transformed directly using linear space-time mapping methods into efficient array architectures. Our approach is to apply first appropriate nonlinear transformations operating on the algorithm's index space, leading to a new DWT formulation on which application of linear space-time mapping can become effective. The first transformation of the algorithm achieves regularization of interoctave dependencies but alone does not lead to efficient array solutions after the mapping due to limitations associated with transforming the three-dimensional (3-D) algorithm onto one-dimensional (1-D) arrays, which is also known as multiprojection. The second transformation is introduced to remove the need for multiprojection by formulating the regularized DWT algorithm in a two-dimensional (2-D) index space. Using this DWT formulation, we have synthesized two VLSI-amenable linear arrays of LPEs computing a 6-octave DWT decomposition with latencies of M and 2M-1, respectively, where L is the wavelet filter length, and M is the number of samples in the data sequence. The arrays are modular, regular, use simple control, and can be easily extended to larger L and J. The latency of both arrays is independent of the highest octave J, and the efficiency is nearly 100% for any M with one design achieving the lowest possible latency of M  相似文献   

15.
The first half of this paper presents the design rationale for CNAPS, a specialized one-dimensional (1-D) processor array developed by Adaptive Solutions Inc. In this context, we discuss the problem of Amdahl's law which severely constrains special-purpose architectures. We also discuss specific architectural decisions such as the kind of parallelism, the computational precision of the processors, on-chip versus off-chip processor memory, and-most importantly-the interprocessor communication architecture. We argue that, for our particular set of applications, a 1-D architecture gives the best “bang for the buck”, even when compared to the more traditional two-dimensional (2-D) architecture. The second half of this paper describes how several simple algorithms map to the CNAPS array. Our results show that the CNAPS 1-D array offers excellent performance over a range of IP algorithms. We also briefly look at the performance of CNAPS as a pattern recognition engine because many image processing and pattern recognition problems are intimately related  相似文献   

16.
Memory requirements and critical path are essential for 2-D Discrete Wavelet Transform (DWT). In this paper, we address this problem and develop a memory-efficient high-speed architecture for multi-level two-dimensional DWT. First, dual data scanning technique is first adopted in 2-D 9/7 DWT processing unit to perform lifting operations, which doubles the throughputs per cycle. Second, for 2-D DWT architecture, the proposed Row Transform Unit and Column Transform Unit take advantage of input sample availabilities and provision computing resources accordingly to optimize the processing speed, in which the number of processors is further optimized to significantly reduce the hardware cost. Third, to address the problem of high cost of memory for the immediate computing results from each level and the computation time as resolution level increases, multiple proposed 2-D DWT units were combined to build a parallel multi-level architecture, which can perform up to six levels of 2-D DWT in a resolution level parallel way on any arbitrary image size at competitive hardware cost. Experimental results demonstrated that the proposed scheme achieves improved hardware performance with significantly reduced on-chip memory resource and computational time, which outperforms the-state-of-the-art schemes and makes it desirable in memory-constrained real-time application systems.  相似文献   

17.
In this paper, the design and implementation of an optimized hardware architecture in terms of speed and memory requirements for computing the tile-based 2D forward discrete wavelet transform for the JPEG2000 image compression standard, are described. The proposed architecture is based on a well-known architecture template for calculating the 2D forward discrete wavelet transform. This architecture is derived by replacing the filtering units by our previously published throughput-optimized ones and by developing a scheduling algorithm suited to the special features of our filtering units. The architecture exhibits high-performance characteristics due to the throughput-optimized filters. Also, the extra clock cycles required due to the tile-based version of the discrete wavelet transform are partially compensated by the proper scheduling of the filters. The developed scheduling algorithm results in reduced memory requirements compared with existing architectures.  相似文献   

18.
In this paper a fast implementation architecture of three-dimensional (3-D) FIR or IIR digital filters via systolic VLSI array processors is described. The modular structure presented is comprised of similar processing elements in a linear cascade configuration with local interconnections. High speed throughput rates are attained due to high concurrency, which is achieved by exploiting both pipelining and parallelism. The considered 3-D FIR and IIR filters may be used for the processing of reconstructed 3-D images and in medical imaging applications.  相似文献   

19.
This paper presents architectures for supporting dynamic data scaling in pipeline fast Fourier transforms (FFTs), suitable when implementing large size FFTs in applications such as digital video broadcasting and digital holographic imaging. In a pipeline FFT, data is continuously streaming and must, hence, be scaled without stalling the dataflow. We propose a hybrid floating-point scheme with tailored exponent datapath, and a co-optimized architecture between hybrid floating point and block floating point (BFP) to reduce memory requirements for 2-D signal processing. The presented co-optimization generates a higher signal-to-quantization-noise ratio and requires less memory than for instance convergent BFP. A 2048-point pipeline FFT has been fabricated in a standard-CMOS process from AMI Semiconductor (Lenart and Owall, 2003), and a field-programmable gate array prototype integrating a 2-D FFT core in a larger design shows that the architecture is suitable for image reconstruction in digital holographic imaging  相似文献   

20.
This paper proposes two JPEG 2000 compliant architectures: one for DWT (Discrete Wavelet Transform) and one for IWT (Integer Wavelet Transform) implementation. First of all some theoretical issues about DWT and IWT are discussed, then, starting from transforms characteristics, the architectures are presented showing both performance and cost. In the literature many DWT architectures have been proposed; our implementation is a new architecture that computes the DWT using filters of interest for the forthcoming JPEG 2000 standard. Moreover, we propose a Lifting Scheme based architecture for IWT, JPEG 2000 compliant too. The proposed architectures are able to support real-time streams: the DWT one, which is made of 20,000 cells, with an input throughput of 160 Msamples per second and a clock frequency of 160 MHz, the IWT one, consisting of 50,000 cells, with an input throughput of 4.5 Msamples per second and an internal clock frequency of 108 MHz.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号