首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
王超  曹鹏  李杰  黄伟达 《现代电子技术》2007,30(14):114-118
离散小波变换(Discrete Wavelet Transform,DWT)需要较多的运算量以及较大的存储器空间,为了使之适用于实时的图像处理应用,就需要开发特殊的架构和芯片来提高离散小波变换的运算性能。基于提升的二维DWT提出了一种新型的VLSI结构——LLSP架构,其结合逐级和基于行的架构这两者特点,带来了硬件开销和存储器空间的降低,并可以用于多提升步骤的扩展以及多级二维离散小波变换。  相似文献   

2.
This paper presents a novel unified and programmable 2-D Discrete Wavelet Transform (DWT) system architecture, which was implemented using a Field Programmable Gate Array (FPGA)-based Nios II soft-core processor working in combination with custom hardware accelerators generated through high-level synthesis. The proposed system architecture, synthesized on an Altera DE3 Stratix III FPGA board, was developed through an iterative design space exploration methodology using Altera’s C2H compiler. Experimental results show that the proposed system architecture is capable of real-time video processing performance for grayscale image resolutions of up to 1920?×?1080 (1080p) when ran on the Altera DE3 board, and it outperforms the existing 2-D DWT architecture implementations known in literature by a considerable margin in terms of throughput. While the proposed 2-D DWT system architecture satisfies real-time performance constraints, it can also perform both forward and inverse DWT, support a number of popular DWT filters used for image and video compression and provide architecture programmability in terms of number of levels of decomposition as well as image width and height. Based from the design principles used to implement the proposed 2-D DWT system architecture, a system design guideline can be formulated for SOC designs which plan to incorporate dedicated 2-D DWT hardware acceleration.  相似文献   

3.
Novel architectures for 1-D and 2-D discrete wavelet transform (DWT) by using lifting schemes are presented in this paper. An embedded decimation technique is exploited to optimize the architecture for 1-D DWT, which is designed to receive an input and generate an output with the low- and high-frequency components of original data being available alternately. Based on this 1-D DWT architecture, an efficient line-based architecture for 2-D DWT is further proposed by employing parallel and pipeline techniques, which is mainly composed of two horizontal filter modules and one vertical filter module, working in parallel and pipeline fashion with 100% hardware utilization. This 2-D architecture is called fast architecture (FA) that can perform J levels of decomposition for N * N image in approximately 2N2(1 - 4(-J))/3 internal clock cycles. Moreover, another efficient generic line-based 2-D architecture is proposed by exploiting the parallelism among four subband transforms in lifting-based 2-D DWT, which can perform J levels of decomposition for N * N image in approximately N2(1 - 4(-J))/3 internal clock cycles; hence, it is called high-speed architecture. The throughput rate of the latter is increased by two times when comparing with the former 2-D architecture, but only less additional hardware cost is added. Compared with the works reported in previous literature, the proposed architectures for 2-D DWT are efficient alternatives in tradeoff among hardware cost, throughput rate, output latency and control complexity, etc.  相似文献   

4.
The implementation of the memory for storing image and transform coefficients in 2-D DWT processing systems using the more cost-effective external memory module such as DDR DRAM is shown to suffer from effective memory bandwidth which is significantly lower than the memory system peak bandwidth if the conventional direct logical-to-physical memory address mapping is adopted. The low effective memory bandwidth is caused by the high level of memory overhead cycle occurrence which is in turn is closely related to the logical memory access patterns of 2-D DWT processes. The problem becomes even more severe for the 2-D DWT processing of video. An analysis on the logical memory access patterns of multi-level 2-D DWT is carried out and an enhanced logical-to-physical memory mapping scheme which minimizes the occurrence of memory overhead cycles is proposed. The proposed scheme is simulated and its performance in terms of effective memory access bandwidth is evaluated and compared with the conventional direct mapping scheme.
Soon-Chieh LimEmail:
  相似文献   

5.
For visual processing applications, the two-dimensional (2-D) Discrete Wavelet Transform (DWT) can be used to decompose an image into four-subband images. However, when a single band is required for a specific application, the four-band decomposition demands a huge complexity and transpose time. This work presents a fast algorithm, namely 2-D Symmetric Mask-based Discrete Wavelet Transform (SMDWT), to address some critical issues of the 2-D DWT. Unlike the traditional DWT involving dependent decompositions, the SMDWT itself is subband processing independent, which can significantly reduce complexity. Moreover, DWT cannot directly obtain target subbands as mentioned, which leads to an extra wasting in transpose memory, critical path, and operation time. These problems can be fully improved with the proposed SMDWT. Nowadays, many applications employ DWT as the core transformation approach, the problems indicated above have motivated researchers to develop lower complexity schemes for DWT. The proposed SMDWT has been proved as a highly efficient and independent processing to yield target subbands, which can be applied to real-time visual applications, such as moving object detection and tracking, texture segmentation, image/video compression, and any possible DWT-based applications.  相似文献   

6.
In this paper, we propose an efficient pipeline architecture for the DWT 9/7 filter defined in JPEG 2000. The proposed architecture is composed of column and row processors to perform the separable 2-D DWT. Based on the rescheduling DWT algorithm, we derive a new data flow graph to shorten the critical path. The proposed 1-D column processor requires less pipeline registers to achieve about the same critical path compared with other lifting-based architectures. For the row processor, the data dependency of each lifting step is reduced to only two computation nodes and therefore more pipeline registers can be applied to achieve higher processing speed without increasing the internal memory size in the 2-D case. That is, for an N × N image, it only requires 4N internal memory to perform the row-wise transform. For the memory bit-width analysis, we use software simulation to reduce the memory bit-width for various compression ratios. Since a portion of information from least significant bits of DWT coefficients would be discarded after EBCOT-tier2 processing, one can decrease the data width of internal memory to perform various compression ratios of JPEG 2000 coding, especially at the low-bit rates. Our simulation results suggest that it is practically possible to design the energy-aware memory architecture to further reduce the power consumption in the future work.  相似文献   

7.
Novel decomposed lifting scheme (DLS) is presented to perform one-dimensional (1D) discrete wavelet transform (DWT) with consistent data flow in both row and column dimension. Based on the proposed DLS, intermediate data can be transferred seamlessly between the column processor and the row processor in the hardware implementation of two-dimensional (2D) DWT, resulting in the reduction of on-chip memory, output latency and control complexity. Moreover, the implementation of 2D DWT can be easily extended to achieve higher processing speed with controlled increase of hardware cost. Memory-efficient and high-speed architectures are proposed to implement 2D DWT for JPEG2000, which are called fast architecture (FA) and high-speed architecture (HA). FA and HA can perform 2D DWT in N 2 /2 and N 2 /4 clock cycles for an N×N image, respectively, but the required internal memory is only 4N for 9/7 DWT and 2N for 5/3 DWT. Compared with the works reported in previous literature, the proposed designs provide excellent performance in hardware cost, control complexity, output latency and computing time. The proposed designs were implemented to process 2D 9/7 DWT in SMIC 0.18 μm CMOS logic fabrication with 4 KB internal memory for the image size 512 × 512. The areas are only 999137 um 2 and 1333054 um 2 for FA and HA, respectively, but the operation frequency can be up to 150 MHz.  相似文献   

8.
一种无乘法高性能9/7离散小波变换滤波器的硬件设计   总被引:1,自引:0,他引:1  
马艳萍  王剑峰  刘云 《电讯技术》2006,46(5):200-204
提出了一种基于提升格式,高效、实时实现JPEG2000中9/7双正交离散小波变换虑波器的VLSI结构设计方法。该方法所设计的结构,在保证同样的精度下,大大减少了运算量,整体运算速度高,硬件花费少,存储需求低,硬件利用率达到100%。用Verilog HDL对系统进行了硬件描述,并选用Xilinx公司的xcv50e-cs144-8器件在ISE4.1环境下实现了综合。  相似文献   

9.
基于提升格式的离散小波变换比传统的基于卷积的运算量少,易于VLSI实现。本文提出了一种基于提升格式,高效实时实现JPEG2000中9/7双正交离散小波变换滤波器的VLSI结构设计方法。该方法所设计的结构,在保证同样的精度下,减少了运算量,整体运算速度高,硬件花费少,存储需求低,硬件利用率达到100%。本文用Verilog HDL对系统进行硬件描述,并选用Xilinx公司的XCV50e-cs144-8器件在ISE4.1环境下实现了综合。  相似文献   

10.
A direct method for the computation of 2-D DCT/IDCT on a linear-array architecture is presented. The 2-D DCT/IDCT is first converted into its corresponding I-D DCT/IDCT problem through proper input/output index reordering. Then, a new coefficient matrix factorisation is derived, leading to a cascade of several basic computation blocks. Unlike other previously proposed high-speed 2-D N /spl times/ N DCT/IDCT processors that usually require intermediate transpose memory and have computation complexity O(N/sup 3/), the proposed hardware-efficient architecture with distributed memory structure has computation complexity O(N/sup 2/ log/sub 2/ N) and requires only log/sub 2/ N multipliers. The new pipelinable and scalable 2-D DCT/IDCT processor uses storage elements local to the processing elements and thus does not require any address generation hardware or global memory-to-array routing.  相似文献   

11.
A Vlsi Architecture for Separable 2-D Discrete Wavelet Transform   总被引:2,自引:0,他引:2  
In this paper, an efficient semi-systolic array architecture for separable 2-D Discrete Wavelet Transform (DWT) is introduced. The semi-systolic array is applicable to any convolution that requires an arbitrary subsampling function. The semi-systolic array presents a better implementation of the convolution function of DWT. This kind of implementation offers a higher efficiency compared to regular systolic implementation when applied for 2-D DWT. The architecture has an efficiency of at least 91% which increases proportional to the number of octaves with no change in the architecture design except for minor modifications to the control logic and memory size. The propose architecture is scalable for different size of filter and different number of octave. The communication routing is minimum since data transfers are limited to immediate neighboring processors. The components of the architecture are fairly regular and consist of minimum number of computational units which makes it a good candidate for VLSI implementation.  相似文献   

12.
This paper investigates efficient hardware architectures for implementation of 1-D and 2-D discrete wavelet transforms (DWTs). The architectures are based on the lifting scheme. We propose a general structure to minimize the number of multipliers and adders for 1-D DWTs. Compared to previous conventional architectures, the architecture presented here is more efficient in terms of the required arithmetic units. Moreover, we describe a new frame scan method for a block-based 2-D DWT structure which provides a flexible trade-off between the required internal memory size and external memory access. In contrast, other 2-D DWT structures require a fixed memory size.  相似文献   

13.
一种快速高效的二维一级小波变换的硬件实现   总被引:2,自引:1,他引:1  
提出了一种针对9/7小波滤波器的二维一级小波变换的硬件平台,整体结构采用流水方式实现,数据分组输入,列变换采用多个小波变换单元,行变换模块为可重构硬件结构,行列变换之间不需要片上存储器。与已有结构相比,该结构可以通过更少的硬件资源消耗获得更高的处理速度。  相似文献   

14.
A framework for mapping systematically 2-dimensional (2-D) separable transforms into a parallel architecture consisting of fully pipelined linear array stages is presented. The resulting model architecture is characterized by its generality, high degree of modularity, high throughput, and the exclusive use of distributed memory and control. There is no central shared memory block to facilitate the transposition of intermediate results, as it is commonly the case in row-column image processing architectures. Avoiding shared central memory has positive implications for speed, area, power dissipation and scalability of the architecture. The architecture presented here may be used to realize any separable 2-D transform by only changing the coefficients stored in the processing elements. Pipelined linear arrays for computing the 2-D Discrete Fourier Transform and 2-D separable convolution are presented as examples and their performance is evaluated.  相似文献   

15.
Efficient architectures for 1-D and 2-D lifting-based wavelet transforms   总被引:4,自引:0,他引:4  
The lifting scheme reduces the computational complexity of the discrete wavelet transform (DWT) by factoring the wavelet filters into cascades of simple lifting steps that process the input samples in pairs. We propose four compact and efficient hardware architectures for implementing lifting-based DWTs, namely, one-dimensional (1-D) and two-dimensional (2-D) versions of what we call recursive and dual scan architectures. The 1-D recursive architecture exploits interdependencies among the wavelet coefficients by interleaving, on alternate clock cycles using the same datapath hardware, the calculation of higher order coefficients along with that of the first-stage coefficients. The resulting hardware utilization exceeds 90% in the typical case of a five-stage 1-D DWT operating on 1024 samples. The 1-D dual scan architecture achieves 100% datapath hardware utilization by processing two independent data streams together using shared functional blocks. The recursive and dual scan architectures can be readily extended to the 2-D case. The 2-D recursive architecture is roughly 25% faster than conventional implementations, and it requires a buffer that stores only a few rows of the data array instead of a fixed fraction (typically 25% or more) of the entire array. The 2-D dual scan architecture processes the column and row transforms simultaneously, and the memory buffer size is comparable to existing architectures.  相似文献   

16.
针对JPEG2000硬件实现中小波变换与编码之间占用大量存储的问题,该文提出一种基于码块的存储方案。通过对码块大小片内存储最大程度的复用以及对其高效简单的调度控制,从面积和功耗两方面减小了硬件实现的开销。在实现中,采用基于行的提升变换结构和比特平面并行的编码方式,提高了效率,确保整个过程的实时处理。实验结果表明:在实时编码要求下,对分辨率为512512的图像分片进行四级9/7或者5/3小波分解,码块大小为3232,采用本文结构所用的存储量与直接使用外部存储器的方法相比可减少80%以上。整个结构已通过FPGA验证,且系统时钟可以工作在100MHz。  相似文献   

17.
Discrete Wavelet Transform: Architectures, Design and Performance Issues   总被引:3,自引:0,他引:3  
Due to the demand for real time wavelet processors in applications such as video compression [1], Internet communications compression [2], object recognition [3], and numerical analysis, many architectures for the Discrete Wavelet Transform (DWT) systems have been proposed. This paper surveys the different approaches to designing DWT architectures. The types of architectures depend on whether the application is 1-D, 2-D, or 3-D, as well as the style of architecture: systolic, semi-systolic, folded, digit-serial, etc. This paper presents an overview and evaluation of the architectures based on the criteria of latency, control, area, memory, and number of multipliers and adders. This paper will give the reader an indication of the advantages and disadvantages of each design.  相似文献   

18.
Three-dimensional discrete wavelet transform architectures   总被引:2,自引:0,他引:2  
The three-dimensional (3-D) discrete wavelet transform (DWT) suits compression applications well, allowing for better compression on 3-D data as compared with two-dimensional (2-D) methods. This paper describes two architectures for the 3-D DWT, called the 3DW-I and the 3DW-II. The first architecture (3DW-I) is based on folding, whereas the 3DW-II architecture is block-based. Potential applications for these architectures include high definition television (HDTV) and medical data compression, such as magnetic resonance imaging (MRI). The 3DW-I architecture is an implementation of the 3-D DWT similar to folded 1-D and 2-D designs. It allows even distribution of the processing load onto 3 sets of filters, with each set performing the calculations for one dimension. The control for this design is very simple, since the data are operated on in a row-column-slice fashion. Due to pipelining, all filters are utilized 100% of the time, except for the start up and wind-down times. The 3DW-II architecture uses block inputs to reduce the requirement of on-chip memory. It has a central control unit to select which coefficients to pass on to the lowpass and highpass filters. The memory on the chip will be small compared with the input size since it depends solely on the filter sizes. The 3DW-I and 3DW-II architectures are compared according to memory requirements, number of clock cycles, and processing of frames per second. The two architectures described are the first 3-D DWT architectures  相似文献   

19.
二维离散小波变换的VLSI实现   总被引:1,自引:0,他引:1  
小波变换图像编码获得了比传统DCT变换编码更好的图像质量和更高的压缩比,然而,实时二维小波变换需要大量运算,因此,专用小波变换芯片的设计已成为小波图像编码中的关键技术,文章提出了一种高速的二维小波变换的VLSI结构。根据模块化的设计思想,设计出一组二维小波变换的基本模块。通过将这些模块按变换要求适当组装,完成了多级二维小波变换,编写了相应的VerilogHDL模型,并进行了仿真和逻辑综合。  相似文献   

20.
Many VLSI architectures for computing the discrete wavelet transform (DWT) were presented, but the parallel input data sequence and the programmability of the 2-D DWT were rarely mentioned. In this paper, we present a parallel-processing VLSI architecture to compute the programmable 2-D DWT, including various wavelet filter lengths and various wavelet transform levels. The proposed architecture is very regular and easy for extension. To eliminate high frequency components, the pixel values outside the boundary of the image are mirror-extended as the symmetric wavelet transform (SWT) and the mirror-extension is realized via the routing network. Owing to the property of the parallel processing, we adopt the row-based recursive pyramid algorithm (RPA), similar to 1-D RPA, as the data scheduling. This design has been implemented and fabricated in a 0.35 m 1P4M CMOS technology and the working frequency is 50 MHz. The chip size is about 5200 m × 2500 m. For a 256 × 256 image, the chip can perform 30 frames per second with the filter length varying from 2 to 20 and with various levels. The proposed architecture is suitable for real-time applications such as JPEG 2000.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号